Sentence processing is the study of how humans understand sentences. The central question is how we recognize and interpret compositional linguistic material—the larger, new expressions built up from smaller, familiar parts. Sentences are a major locus of novel composition, but so are phrases and even complex words. Sentence processing is generally successful and accurate, although it involves the integration of many distinct information sources. The core source is our long-term knowledge of the language varieties we use. This includes systematic, grammatical knowledge from which we generate words, phrases, and sentences; inventorial knowledge of specific items, from morphemes to multiword expressions; and conditions of use, like those pertaining to register, style, or personae. We also integrate our richly textured understanding of the world and how it works, an understanding which frequently extends to the workings of our fellow interlocutors and their intentions. This complex array of information must be rapidly coordinated on the scale of hundreds to thousands of milliseconds. The extent and complexity of linguistic expressions implicate the routine involvement of working memory. The pressure to understand promptly, along with pervasive ambiguity encountered in all languages, requires tapping a capacity to preallocate resources and predictively encode upcoming material.

History

Grammatical rules and representations are a central part of a language user’s knowledge, which has long been recognized (Chomsky, 1957). For almost as long, psycholinguists have asked how a comprehender could assign such representations to sentences in real time, subject to the computational limitations that presumably characterize a language user with finite memory and resources [see Psycholinguistics]. Miller and Chomsky (1963) offered an early formal model of incremental grammatical analysis with a finite memory. Their model made testable psycholinguistic predictions: It predicted that recursive self-embedding structures should be difficult to recognize, as is the case with the notoriously difficult-to-parse doubly center embedded structures in English (for example, “The article [that the professor [that everyone loves] had assigned] totally stumped the class”). This research paved the way for further empirically grounded theorizing about the cognitive mechanisms underlying real-time sentence processing. In a foundational textbook, Fodor et al. (1974) summarized the promise of this emerging research program. They noted a tension between modeling an “ideal sentence recognizer” and the need to situate such a model squarely within the bounds of what is known about human psychology. This perspective continues to define this research: The 50 years of sentence processing work that followed the publication of this textbook can be partially understood as a process of closing the gap between abstract models of sentence processing and the “psychological matrix” in which those models are embedded.

One key question for this emerging field was whether sentence recognition processes have properties that do not simply follow from the grammatical structure of language—in other words, whether sentence processing comprises a natural object of scientific study (Fodor et al., 1974). Positive evidence for this comes from a large—and still growing—“body of [processing] phenomena that are systematic but not explicable within the constructs manipulated by formal linguistics” (Fodor et al., 1974, p. 369). The identification of such phenomena triggered the development of theories aimed at explaining them. These theories elaborated on different aspects of the cognitive mechanisms that allow sentence recognition. For example, Kimball (1973) articulated a model of how short-term memory was used to represent syntactic structure. His model predicted a range of processing effects, including ambiguity resolution preferences and different types of reanalysis difficulty (see also Frazier & Fodor, 1978). In related work, Frazier (1979) proposed a set of processing principles based on a theoretical proposal of sentence processing. Her Garden Path Theory held that the comprehender sought to form a single, determinate analysis of the input at each point as quickly as possible. This led to comprehenders prioritizing less complex structures (e.g., minimal attachment) and more local structural attachments (e.g., late closure).

Kimball’s, Fodor’s, and Frazier’s models also advanced another key theoretical claim: that syntactic processing was relatively isolated from other processes involved in language comprehension. In other words, they claimed that syntactic analysis was modular in the sense of Fodor (1983). It was this provocative claim that drove much of the research in the 1980s, with researchers asking what extragrammatical factors, if any, could influence incremental sentence processing. There were vigorous empirical debates around the role of animacy, plausibility, verb subcategorization, frequency, and more. This research spurred the development of an alternative model of sentence processing, constraint-based parsing (MacDonald et al., 1994). This view rejected the idea of syntactic analysis as an isolated process and instead posited competitive, simultaneous interaction between multiple levels of representation as the central mechanism in syntactic recognition. Constraint-based parsing models also differed in another key aspect from the Garden Path Model. Whereas the Garden Path Model proposed that comprehenders were driven by the need to identify a single structure for the input, constraint-based parsing rejected this “serial” parsing assumption in favor of a “parallel” model. On this view, comprehenders activated multiple different structural analyses of the input, and ranked them by how well they satisfied a range of constraints (MacDonald et al., 1994).

At the same time, another important development was the turn away from characterizing syntactic analysis processes in vacuo to situating syntactic analysis in a discourse context (Crain & Steedman, 1985; Tanenhaus et al., 1995). In a foundational study, Tanenhaus et al. (1995) showed that real-time syntactic analysis was guided by the referential potential of different analyses in a visual context. Figure 1 illustrates the highly-influential method they used. Structural analyses that permitted successful reference in context were prioritized over those that did not. This finding illustrated that the principles that determine how comprehenders recognize syntactic structure in real time cannot be purely derived from properties of syntactic rules or constraints: A structure’s utility in a discourse context partially determines its priority in incremental structural analysis, a view that comports well with constraint-based parsing models. In a parallel development, experimental evidence suggesting a central role for prosody in cueing syntactic analysis came to the fore around the same time, driving a significant amount of work on the syntax-prosody interface in sentence processing (Fodor, 1998; Frazier et al., 2006).

**Figure 1**
The visual world paradigm (Tanenhaus et al., 1995) and its descendants are a valuable technique for tracking the dynamics of sentence processing. (A) A miniature “visual world” depicts particular individuals, states, or eventualities. (B) Participants’ eye movements are recorded while they view and interact with the world. (C) Fixations to areas of interest are interpreted with respect to the onset of critical linguistic information. (D) They can be aggregated and normalized in time bins, for example, to generate a smoothed, continuous fixation proportion over time. By examining where critical conditions diverge, it is possible to estimate when information in the language signal influences the evolving interpretation. Pig and goat images adapted from Pizarro-Guevara (2020) with permission of the author.

In the early 2000s, researchers began to explore the idea that incremental recognition of syntax could be viewed as a form of rational inference. On this view, the primary goal of the comprehender is to infer the most likely structure of the input, and they do this by updating a probability distribution over those analyses word by word as the input unfolds (Hale, 2001; Levy, 2008). Surprisal, the negative log probability of a word in context, is a key index of incremental processing difficulty on this model, since it indexes how much a single word should cause a rational comprehender to update their beliefs about the structure of the input (Levy, 2008). Surprisal correctly predicted a wide range of effects in sentence processing (Hale, 2001; Levy, 2008). This was an important shift in perspective: Surprisal Theory highlighted rational inference over syntactic structures as the key driving force in sentence processing, rather than memory limitations or referential coherence. This dovetailed with contemporaneous trends in the cognitive sciences about the role of predictive coding in perception (e.g., Clark, 2013).

The field of sentence processing later developed in several parallel directions, each of which explored the connection between sentence processing mechanisms and other areas of cognition. The link between syntactic processing and working memory became a prominent area of research, spurred by claims that sentence processing was subserved by a highly capacity-limited, content-addressable memory system (McElree, 2000) and by the Adaptive Control of Thought–Rational (ACT-R) model of Lewis and Vasishth (2005), which grounded syntactic processing in an associative cue-based memory architecture [see Working Memory]. This accelerated the discovery of a range of “grammatical illusions” and similarity-based interference effects in comprehension (Phillips et al., 2010). The Good Enough processing framework emphasized the role of fast, frugal heuristics in sentence recognition, by analogy to the literature on heuristic decision-making. This approach highlighted the various systematic ways that comprehenders might entertain nonveridical or nonliteral interpretations of their linguistic input (Ferreira & Patson, 2007). And the Noisy Channel model began to explore how comprehenders might deal with noise and errors in the input, highlighting the role of rational (Bayesian) inference in deriving the ultimate interpretation of a sentence (Gibson et al., 2013).

Core concepts

**Figure 2**
Numerous problem domains in the study of sentence processing are connected by fundamental questions about the acquisition and use of specialized linguistic knowledge in a capacity-limited cognitive architecture.

Processing complexity

Sentence processing theories aim to explain a range of basic observations about what kind of expressions are more easily understood than others or what kind of interpretations are more easily accessed than others. A few comparisons in English help to give a sense of this. For example, some classes of sentences are simply harder to understand than others, like sentence (1a) in comparison to sentence (1b). The brackets in both sentences demarcate a relative clause. In sentence (1a), the relativized argument “pig” is a grammatical object inside the relative clause, whereas in sentence (1b), it is a grammatical subject.

1a. The pig [that the goat’s kicking _] is stirring up trouble in the barnyard again.

1b. The pig [that’s _ kicking the goat] is stirring up trouble in the barnyard again.

The superficial difference in the strings nonetheless gives rise to, in sentence (1a) versus sentence (1b), more misunderstandings of who did what to whom and more markers of effortful comprehension, such as elevations in reading times, pupil diameters, or cerebral blood flow (especially during the processing of the bracketed region).

As another example, some kinds of sentences characteristically lead the reader astray: so-called garden-path sentences. Sentences (2a) and (3a) persistently encourage nonveridical interpretations (↝ p), ones which are not contemplated in counterparts (2b) and (3b). Unlike their counterparts, the garden-path sentences lack informative morphological cues to the target interpretation.

2a. Dr. S found the bones were missing from the crate.
(↝ she found the bones; ultimately FALSE because they were missing.)

2b. Dr. S found <that the> bones were missing from the crate.
(Co-occurrence of “that” with “the” signals an embedded clause, not a nominal.)
3a. As we embraced the diva belted out La vie en rose.
(↝ we embraced the diva; ultimately FALSE because we embraced each other.)

3b. As we embraced <they> belted out La vie en rose.
(Pronoun form signals a nominal expression in grammatical subject position.)

As a final example, many sentences are logically ambiguous, but one interpretation tends to prevail. In sentences (4a) and (4b), the pronoun “he” can corefer with either of the previously named individuals (Lyle, Leonard). But, there is a palpable urge to construe it as referring to Lyle in sentence (4a) and to Leonard in sentence (4b). Here there are no obvious syntactic differences between sentences (4a) and (4b) but an important difference in how the meaning of the verbs “annoy” and “love” are mapped to grammatical roles: A source of annoyance is expressed as a grammatical subject in English, but a source of love is a grammatical object.

4a. Lyle annoys Leonard because he’s always showboating. (“he” probably = Lyle)

4b. Lyle loves Leonard because he’s always showboating. (“he” probably = Leonard)

Examples like sentences (1)–(4) illustrate several of the bread-and-butter issues in sentence processing: Is processing difficulty coupled with grammatical complexity? Why do comprehenders fall into the traps set by local ambiguities, and how do they get out of them? What types of information drive the resolution of dependencies, like the ones illustrated in example (4) between the pronoun “he” and its referent (Lyle or Leonard)? The latter question is a particularly important one because sentences are, by anyone’s account, complex mental objects. In the broadest sense, they constitute mappings between phonological forms and semantic structures. Thus, the internal logic of each of those levels can affect the structure of the sentence. And each of those levels are themselves layered and intricate. Therefore, when we are processing a sentence, we are rolling up information from many sources, including individual phonemes, subparts of words, words themselves, the syntactic rules they participate in, and the meanings they contribute. At the same time, we are integrating that information with the broader discursive or dialogic context in which the sentence occurs. Despite the intricacy, sentence processing is mostly effortless, a feature that all theories must explain.

Finally, an important question arises about whether the sentence processing systems we find in humans share any properties that are independent of particular languages or, more precisely, that are independent of any individuals’ particular linguistic knowledge. In other words, are there universally applicable mechanisms or principles of sentence processing? How closely tailored are processing routines to the grammars of particular languages? Until the past 15 years, psycholinguistic research has focused on a narrow slice of the world’s linguistic diversity, so we largely remain in the dark on the question of universality.

Returning to the contrast between difficult object relative clauses, sentence (1a), and easier subject relative clauses, sentence (1b), this has sometimes been identified as a pervasive, perhaps universal contrast. In those examples, we see that English deploys word order differences to signal an object versus subject relative clause. But in other languages, like Tagalog, verb form differences do the same work. The expressions in sentences (5a) and (5b) differ by the single bracketed morpheme, expressed inside the verb, but they yield interpretations analogous to sentences (1a) and (1b).

5a. baboy na s<in>isipa ng kambing …
pig LNK PV.kick GEN goat
“The pig that the goat’s kicking …”

5b. baboy na s<um>isipa ng kambing …
pig LNK AV.kick GEN goat
“The pig that’s kicking the goat …”

Differences in word order and inflection notwithstanding, sentences like sentence (5a) are still less robustly processed than sentences like sentence (5b) in Tagalog, even though the verb form in sentence (5a) and its argument linking are generally more common. An old interpretation of these relatively new facts is that when the comprehender faces a temporarily ambiguous nominal, as they do when they hear “baboy” (“pig”) in the course of processing either sentence (5a) or (5b), then either they can wait or they can relieve the tension by assuming it is the grammatical subject (or agent, perhaps). What they do not do, it seems, is assume that it is a non-subject.

Parsing and local ambiguity

The basic process of assigning a syntactic analysis to a string is referred to as parsing. We can model comprehenders as transforming the linear string of words in speech or text into higher-order data structures, like trees or logical formulae, that ultimately determine how the sentence is interpreted in context. A major challenge for parsing in humans is that natural languages abound in local ambiguities. Their causes are diverse, but local ambiguities usually stem from a pernicious combination of the following:

Lexical category ambiguity: Many open-class words belong to multiple lexical categories. For example, “prune” in English can be a verb or a noun.
Syncretisms in functional morphology [see Morphology]: Functional morphology often appears formally identical across multiple uses (syncretism). For example, the sequence “-ak” in Basque can occur at the end of singular subjects of transitive clauses or be affixed to plural arguments that are not transitive subjects. Another example: English “that” introduces several distinct constructions, including complement clauses, relative clauses, and noun phrases.
Phonetically null positions: A word or phrase can often be absent from an otherwise expected position in many constructions, like movement dependencies (relativization, question formation, focus constructions), argument drop, ellipsis, and transitivity alternations. For example, the verb “play” can either be an intransitive verb with no direct object (as in, “Where did you see the children playing?”) or a transitive verb whose direct object has been moved elsewhere (as in “What did you see the children playing _?” with the null position marked by an underscore).
Recursively applicable syntactic rules: Syntactic rules often have broad and recursive applicability, which can lead to an increase in the number of syntactic analyses. For example, the phrase “the advisor to the politician from Utah” has two basic meanings due to the iterative use of modifiers (i.e., either the advisor or the politician could be from Utah).
Word order flexibility: Linguistic systems often allow flexibility in the order of words and phrases. For example, the Austronesian language Chamorro, like many verb-initial languages, can put the subject either immediately after the verb (V-S-X) or at the end of the sentence (V-X-S). The sentence “ha toktuk i patgun i tata,” could mean “the kid hugged the dad” or “the dad hugged the kid.”

Example (6) illustrates a local ambiguity created by the fact that the word “what” in English must be linked to a phonetically null position and yet provides no clues to which position.

6a. What did you paint …
The word “what” links to a null position, but it does not signal where.

6b. What did you paint _?
This is a plausible temporary interpretation.

6c. What did you paint the room with _?
But so is this one.

Incremental and predictive processing

The widespread existence of local ambiguity is not, on its own, a problem. The entire sentence (6c) has a single syntactic analysis and thus a single primary interpretation. However, humans seem to comprehend language both incrementally (e.g., Sedivy et al., 1999) and predictively (e.g., Altmann & Kamide, 1999; Staub & Clifton, 2006; Chow et al., 2016). They do not wait for a full phrase, clause, or sentence to analyze the material they have already encountered. Moreover, they make inferences about what is coming next. For this reason, local ambiguities can lead comprehenders astray if they choose the wrong analysis. Thus, for example (6), comprehenders initially do go through the parse state represented in sentence (6b) before arriving, with some effort, at sentence (6c) (Stowe, 1986).

The famous garden-path sentence “the horse raced past the barn fell” was a parade case of local ambiguity resolution gone wrong (Bever, 1970). Example (7a) gives an analogous version, which is disambiguated in (7b). The local ambiguity at the word “melted” is caused by the optionality of the word “that,” a particular rule of English allowing deletion of the verb “be,” and the morphological syncretism in most English verbs as they occur in past tense transitive clauses and in passive clauses.

7a. The butter <melted> in the pan was starting to burn. (<local ambiguity>)

7b. The butter that was melted in the pan was starting to burn. (no local ambiguity)

Sentences that follow the pattern of sentence (7a) are appreciably difficult to understand. Reading time studies, especially using eye-tracking-while-reading paradigms (e.g., Clifton et al., 2003), locate much of this difficulty to the site of disambiguation, which, in the case of sentence (7a), would be the verb “was” (although that is not true of all such studies).

A continuing area of debate centers on what causes disambiguation difficulty. Broadly speaking, there are two views: One is that the comprehender pursues a primary analysis that may turn out to be wrong. Disambiguation difficulty is a function, under this view, of recognizing the mistake and reanalyzing (Fodor & Inoue, 1994; Paape & Vasishth, 2022). Earlier reanalysis theories focused on parsers that functioned deterministically, always making the same decision in the face of the same information (Marcus, 1980). However, contemporary theories assume stochastic decision-making, where the same choice point can lead to a range of outcomes, and at least mild parallelism, where those decisions can take into account more than one parse of the input. Alternatively, difficulty can be seen as caused by the low contextual probability of the disambiguating word, which points to probabilistic inference over a distribution of particular structures or interpretations, as in Surprisal Theory or other information-theoretic approaches (Hale, 2003; Levy, 2008). It is conceivable that these two views may be eventually reconciled as providing theories of an information processing system at different levels of description (Marr, 1982), the former view being an algorithmic-level theory and the latter a computational-level theory. Current information-theoretic accounts implicate considerable parallelism in terms of the number or extent of analyses that are, in some sense, simultaneously under consideration. Whether this can be harmonized with other claims pointing to capacity limitations on working memory (McElree, 2006) or whether those limitations should be rejected is an unresolved issue (cf. Futrell et al., 2020). At present, much research remains devoted to using increasingly precise computational models, large sample sizes, and powerful statistical methods (e.g., van Schijndel & Linzen, 2021).

Dependency formation

Another key topic in sentence processing is dependency formation. Linguistic expressions are knit together with a variety of nonadjacent dependencies: links between words and phrases where the interpretation or form of one element depends upon another. Some prominent and well-investigated examples include agreement, pronoun resolution, filler-gap dependencies, and ellipsis. These are illustrated with English sentences in example (8) where the words anchoring the dependency are bracketed.

8a. Agreement: The <contract> between the two parties <was> determined to be invalid.

8b. Pronouns: The candidates who <Al> endorsed spoke about <him> fondly.

8c. Filler-gap: Gil knew <which paths> their admirer chose to <walk> _ each day.

8d. Ellipsis: Marc’s <new poem> was widely seen as inferior to <Odette’s>.

In sentences (8a)–(8d), the rightmost dependent can only be correctly parsed and interpreted with reference to the leftmost dependent. Notably, an indefinite amount of material can intervene between the two elements. Comprehenders must be able to encode information about the left dependent in a way that is durable and accessible when the right dependent is processed, which could be hundreds or thousands of milliseconds later. Dependency formation has thus been heavily investigated as a window into how working memory is implicated in language processing. One important debate has been how specialized such working memory is for sentence processing (Caplan & Waters, 2013). Another has been how to apply the associative memory models present in many other cognitive systems to linguistic dependency formation (Lewis & Vasishth, 2005). A critical data pattern informing this research has been the existence of interference effects. Interference occurs when intervening linguistic material similar to the target dependent/dependency somehow intrudes upon comprehension. Agreement attraction (Wagers et al., 2009) and finite subject interference (Van Dyke & Lewis, 2003) are two notable phenomena argued to reflect erroneous attempts at retrieving the correct left dependent from memory. In agreement attraction, comprehenders fail to notice an agreement error between a verb and its subject if there is a sufficiently similar element that could agree with the verb. Thus, the ungrammatical version of sentence (8a), “the <contract> between the two parties <were> determined to be invalid,” is challenging to recognize as containing an error, because even though “the contract” requires the third-person singular form “was,” the intervening plural noun “parties,” which agrees with “were,” makes the sentence sound acceptable.

Dependency formation is thus sometimes sensitive to information that the grammar says it ought not to have access to. This influence may stem from how previously encountered information is reaccessed, especially when a sufficiently distinct retrieval cue in context cannot be mustered (Nairne, 2006). However, according to another view, the encoding of dependents in the left context itself may be labile or subject to distortion (Yadav et al., 2023). Both the retrieval and encoding viewpoints have typically presupposed that the comprehender creates a single mutually consistent representation of the sentence. However, exhaustively parsed, faithful and self-consistent representations may simply not be necessary to serve the goals of the comprehender, as contemplated by the Good Enough framework (Ferreira & Patson, 2007). A radical alternative is presented by the dynamical systems approach taken by self-organized sentence processing (Smith et al., 2021). Models in this framework allow multiple parses, including globally ungrammatical ones, to interact more freely and generate syntactic structures from the bottom up. For example, in local syntactic coherences (Tabor et al., 2004), substrings that are otherwise independent sentences can create difficulty even though they are not parsable in the global context. The bracketed words in sentence (9) constitute a local syntactic coherence in English. Because they occur after the preposition “at,” they can only be grammatically parsed as a nominal expression [a reduced, passive relative clause, cf. sentence (7)]. But their parsability as a main clause in other environments (i.e., when the other words are ignored) may nonetheless lead to the erroneous consideration of that parse here.

9. The coach smiled at <the player tossed the frisbee> by her careless opponent.

Priming

The debates above reflect evolving, open questions about the short-term representation of linguistic information during the processing of a sentence. Another important issue is how single episodes of sentence processing affect the long-term representation of syntactic knowledge. One way to frame this question is in terms of priming: Does the processing of a particular sentence structure now ease the processing of the same sentence structure later? The existence of priming effects in language production have been well-established for decades (Bock, 1986), but only more recently have researchers investigated it in comprehension (e.g., Arai et al., 2007; Thothathiri & Snedeker, 2008; Pickering et al., 2013). In production priming, the choice to produce a particular syntactic structure from alternatives increases the likelihood of making the same choice again in subsequent trials. For example, the sentences in example (10) are differently structured, but they convey the same meaning. In sentence (10a) the recipient of the sending event is expressed inside a prepositional phrase (“to his beloved”), whereas in sentence (10b) it is expressed as the first object in a double-object construction.

10a. Lorenzo sent a sonnet to <his beloved>.

10b. Lorenzo sent <his beloved> a sonnet.

10c. Mario mailed <him> the verses back. (> Mario mailed the verses back to <him>).

Having produced sentence (10b) makes one more likely to produce double-object constructions again in subsequent sentences expressing similar events [like sentence (10c)]. Although the evidence remains sparser, it now seems clear that first comprehending a sentence like sentence (10b) also makes it easier to process a subsequent sentence with the same abstract structure. Major questions still under investigation concern how abstract the priming is, how long-lived it is, and what the underlying mechanism is. On one view, syntactic priming is a form of implicit learning (Chang et al., 2000), with language users gradually updating probabilities on syntactic rules or weights in a distributed representation. On another view, syntactic priming reflects short-term changes in the activation of stored syntactic representations (Pickering et al., 2013). It may be both (Tooley & Traxler, 2018). Cumulative effects of priming, sometimes separately referred to as syntactic adaptation, remain poorly understood (Fine & Jaeger, 2013; Harrington Stack et al., 2018), but they have clear implications for language learners (Chang et al., 2006; Kaan & Chun, 2018).

Questions, controversies, and new developments

Broad empirical coverage and large-scale evaluation

Many of the field’s newest developments have come from attempts to renovate its empirical foundation. Traditionally that foundation has been the experimental method: bespoke data collection in controlled conditions, with key explanatory variables under experimenter control. In recent decades, however, a complementary broad coverage approach to theory evaluation has gained prominence. This view, rooted in model evaluation approaches common in natural language processing, evaluates a theory’s ability to explain variation in a corpus of psycholinguistic data, such as large corpora of reading time data (e.g., Demberg & Keller, 2008). These approaches have the benefit of examining psycholinguistic data collected under relatively natural conditions, typically do not rely on artificially created experimental stimuli, and support theory development by encouraging the development of computationally precise theories whose predictions can be quantitatively evaluated on large bodies of data. However, broad coverage theory evaluation can suffer if the corpus of data is not sufficiently diverse with respect to the important variables of interest (Futrell et al., 2021) and, as correlational data, can impede researchers from drawing unambiguous causal conclusions about the factors that influence sentence processing difficulty.

Artificial intelligence

An especially rapidly changing development is the renewal of the relationship between sentence processing and artificial intelligence. These two fields have long been mutually reinforcing. For example, next-word prediction, a key principle of many neural language models, has long been theorized to be a key organizing principle of sentence comprehension (Elman, 1990), and the development of incremental language models in natural language processing contexts spurred developments in Surprisal Theory (Hale, 2001). Perhaps unsurprisingly, given the recent rapid expansion of research on AI and language models [see Large Language Models], new areas of investigation in sentence processing have taken root at this intersection. Several questions naturally arise here: How close is the alignment between human and AI language processing (Huang et al., 2024)? Do language models encode linguistic representations that bear any similarity to the grammatical representations posited by linguistic and psycholinguistic theorists (Manning et al., 2020)? And can psycholinguistic data be leveraged to minimize the gap between human and machine language processing (Eisape et al., 2020)?

Linguistic and language user diversity

Finally, the database of sentence processing has historically relied on a very narrow slice of human and linguistic diversity. And just as there has been a broader grappling with the lack of this diversity in other cognitive and language sciences (Blasi et al., 2022; Kidd & Garcia, 2022), sentence processing research has had to address the fact that the languages it has investigated are shockingly few. Six Indo-European languages of Western Europe (English, German, French, Dutch, Spanish) and two East Asian languages (Japanese, Mandarin) have historically made up the vast majority of research. Those languages are all national languages associated with power and prestige, whose large numbers of users are literate and available to participate in experiments in university settings (cf. Henrich et al., 2010) [see WEIRD]. They do not represent many of the typologically important features of natural language that could play important roles in theory development and adjudication, such as non-nominative case alignment systems, verb-initial word order patterns, and verbal inflection for categories other than person and number (Bornkessel-Schlesewsky & Schlesewsky, 2016). Moreover, the overrepresentation of literacy and reading in sentence processing research has undoubtedly smuggled in a number of biases, including an overreliance on orthographic words (cf. Krauska & Lau, 2023) and an inattention to the multimodal array of information in which language is embedded (cf. Holler & Levinson, 2019). Encouragingly, the past 15 years of sentence processing research has witnessed a small but noticeable shift in researchers’ practices (Sauppe et al., 2023; Collart, 2024). Bodies of research that represent language processing in underrepresented language families, often in community-based settings and in partnership with local stakeholders, are now gradually becoming established (e.g., Pizarro-Guevara & Garcia, 2024).

Broader connections

This article has focused on core properties of sentence processing primarily from the vantage of psycholinguistic frameworks and computational models. However, a rich but somewhat separate body of research investigates the neural basis of language processing (Poeppel et al., 2012; Hagoort, 2016) [see Neuroscience of Language]. One leading question concerns the primitive operations that support linguistic processes at different levels, including compositional syntax and semantics (Pylkkänen, 2019; Matchin & Hickok, 2020) and how hierarchical structure can be represented at different timescales (Ding et al., 2016; Kaufeld et al., 2020). Another major challenge is to understand the broader spatiotemporal organization of the constituent language networks and their relationship to other brain networks (Hickok & Poeppel, 2007; Fedorenko et al., 2024).

Critical perspectives on sentence processing come from theory development and data collection in a wider variety of settings than in the monolingual college-attending adults that have featured in most studies. In addition to the dimensions of linguistic and human diversity discussed above, other key contexts include sentence processing in child language acquisition (Snedeker & Trueswell, 2004; Omaki & Lidz, 2015) [see Language Acquisition], in second language learning (Kaan & Grüter, 2021; Hopp, 2022), in developmental and acquired language disorders (Marinis, 2024) [see Developmental Language Disorder], and in individual differences research that includes a broader sample of typically developing adults (Van Dyke et al., 2014) or better characterizes their experience (James et al., 2018) [see Linguistic Variation].

Acknowledgments

We gratefully acknowledge the support we received during the preparation of this article from the National Science Foundation (2019804); from the Spencer Foundation to the University of California, Santa Cruz; and from the NSF/US-Israel Binational Science Foundation (2146798) and from the National Science Foundation (2020914) and (1941485) to the University of Massachusetts, Amherst.

References

Altmann, G. T. M., & Kamide, Y. (1999). Incremental interpretation at verbs: Restricting the domain of subsequent reference. Cognition, 73(3), 247–264. https://doi.org/10.1016/S0010-0277(99)00059-1
↩
Arai, M., van Gompel, R. P. G., & Scheepers, C. (2007). Priming ditransitive structures in comprehension. Cognitive Psychology, 54(3), 218–250. https://doi.org/10.1016/j.cogpsych.2006.07.001
↩
Bever, T. G. (1970). The cognitive basis for linguistic structures. In J. R. Hayes (Ed.), Cognition and the development of language (pp. 279–362). Wiley.
↩
Blasi, D. E., Henrich, J., Adamou, E., Kemmerer, D., & Majid, A. (2022). Over-reliance on English hinders cognitive science. Trends in Cognitive Sciences, 26(12), 1153–1170.
↩
Bock, J. K. (1986). Syntactic persistence in language production. Cognitive Psychology, 18(3), 355–387. https://doi.org/10.1016/0010-0285(86)90004-6
↩
Bornkessel-Schlesewsky, I., & Schlesewsky, M. (2016). The importance of linguistic typology for the neurobiology of language. Linguistic Typology, 20(3), 615–621. https://doi.org/10.1515/lingty-2016-0032
↩
Caplan, D., & Waters, G. S. (2013). Memory mechanisms supporting syntactic comprehension. Psychonomic Bulletin & Review, 20(2), 243–268. https://doi.org/10.3758/s13423-012-0369-9
↩
Chang, F., Dell, G. S., & Bock, K. (2006). Becoming syntactic. Psychological Review, 113(2), 234–272. https://doi.org/10.1037/0033-295X.113.2.234
↩
Chang, F., Dell, G. S., Bock, K., & Griffin, Z. M. (2000). Structural priming as implicit learning: A comparison of models of sentence production. Journal of Psycholinguistic Research, 29(2), 217–230. https://doi.org/10.1023/A:1005101313330
↩
Chomsky, N. (1957). Syntactic structures. Mouton. https://doi.org/10.1515/9783112316009
↩
Chow, W. Y., Smith, C., Lau, E. F., & Phillips, C. (2016). A “bag-of-arguments” mechanism for initial verb predictions. Language, Cognition and Neuroscience, 31(5), 577–596. https://doi.org/10.1080/23273798.2015.1066832
↩
Clark, A. (2013). Whatever next? Predictive brains, situated agents, and the future of cognitive science. Behavioral and Brain Sciences, 36(3), 181–204. https://doi.org/10.1017/S0140525X12000477
↩
Clifton, C., Traxler, M. J., Mohamed, M. T., Williams, R. S., Morris, R. K., & Rayner, K. (2003). The use of thematic role information in parsing: Syntactic processing autonomy revisited. Journal of Memory and Language, 49(3), 317–334. https://doi.org/10.1016/S0749-596X(03)00070-6
↩
Collart, A. (2024). A decade of language processing research: Which place for linguistic diversity? Glossa Psycholinguistics, 3(1), 1432. http://dx.doi.org/10.5070/G60111432
↩
Crain, S., & Steedman, M. (1985). On not being led up the garden path: The use of context by the psychological syntax processor. In D. Dowty, L. Karttunen, & A. Zwicky (Eds.), Natural language parsing: Psychological, computational, and theoretical perspectives (pp. 320–358). Cambridge University Press. https://doi.org/10.1017/CBO9780511597855.011
↩
Demberg, V., & Keller, F. (2008). Data from eye-tracking corpora as evidence for theories of syntactic processing complexity. Cognition, 109(2), 193–210. https://doi.org/10.1016/j.cognition.2008.07.008
↩
Ding, N., Melloni, L., Zhang, H., Tian, X., & Poeppel, D. (2016). Cortical tracking of hierarchical linguistic structures in connected speech. Nature Neuroscience, 19(1), 158–164. https://doi.org/10.1038/nn.4186
↩
Eisape, T., Zaslavsky, N., & Levy, R. (2020). Cloze distillation: Improving neural language models with human next-word prediction. In R. Fernández & T. Linzen (Eds.), Proceedings of the 24th Conference on Computational Natural Language Learning (pp. 609–619). Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.conll-1.49
↩
Elman, J. L. (1990). Finding structure in time. Cognitive Science, 14(2), 179–211. https://doi.org/10.1207/s15516709cog1402_1
↩
Fedorenko, E., Ivanova, A. A., & Regev, T. I. (2024). The language network as a natural kind within the broader landscape of the human brain. Nature Reviews Neuroscience, 25(5), 289–312. https://doi.org/10.1038/s41583-024-00802-4
↩
Ferreira, F., & Patson, N. D. (2007). The ‘good enough’ approach to language comprehension. Language and Linguistics Compass, 1(1‐2), 71–83. https://doi.org/10.1111/j.1749-818X.2007.00007.x
↩
Fine, A. B., & Florian Jaeger, T. (2013). Evidence for implicit learning in syntactic comprehension. Cognitive Science, 37(3), 578-591.
↩
Fodor, J. A. (1983). The modularity of mind. MIT Press. https://doi.org/10.7551/mitpress/4737.001.0001
↩
Fodor, J. A., Bever, T. G., & Garrett, M. F. (1974). The psychology of language: An introduction to psycholinguistics and generative grammar. McGraw-Hill.
↩
Fodor, J. D. (1998). Learning to parse? Journal of Psycholinguistic Research, 27(2), 285–319. https://doi.org/10.1023/A:1023258301588
↩
Fodor, J. D., & Inoue, A. (1994). The diagnosis and cure of garden paths. Journal of Psycholinguistic Research, 23(5), 407–434. https://doi.org/10.1007/BF02143947
↩
Frazier, L. (1979). On comprehending sentences: Syntactic parsing strategies. [Doctoral dissertation, University of Connecticut]. https://digitalcommons.lib.uconn.edu/dissertations/AAI7914150
↩
Frazier, L., & Fodor, J. D. (1978). The sausage machine: A new two-stage parsing model. Cognition, 6(4), 291–325. https://doi.org/10.1016/0010-0277(78)90002-1
↩
Frazier, L., Carlson, K., & Clifton, C., Jr. (2006). Prosodic phrasing is central to language comprehension. Trends in Cognitive Sciences, 10(6), 244–249.
↩
Futrell, R., Gibson, E., & Levy, R. P. (2020). Lossy-context surprisal: An information-theoretic model of memory effects in sentence processing. Cognitive Science, 44(3), e12814. https://doi.org/10.1111/cogs.12814
↩
Futrell, R., Gibson, E., Tily, H. J., Blank, I., Vishnevetsky, A., Piantadosi, S. T., & Fedorenko, E. (2021). The Natural Stories corpus: A reading-time corpus of English texts containing rare syntactic constructions. Language Resources and Evaluation, 55(1), 63–77. https://doi.org/10.1007/s10579-020-09503-7
↩
Gibson, E., Bergen, L., & Piantadosi, S. T. (2013). Rational integration of noisy evidence and prior semantic expectations in sentence interpretation. Proceedings of the National Academy of Sciences, 110(20), 8051–8056. https://doi.org/10.1073/pnas.1216438110
↩
Hagoort, P. (2016). MUC (Memory, Unification, Control): A model on the neurobiology of language beyond single word processing. In G. Hickok & S. L. Small (Eds.), Neurobiology of language (pp. 339–347). Academic Press. https://doi.org/10.1016/B978-0-12-407794-2.00028-6
↩
Hale, J. (2001). A probabilistic earley parser as a psycholinguistic model. Proceedings of the Second Meeting of the North American Chapter of the Association for Computational Linguistics on Language Technologies. https://doi.org/10.3115/1073336.1073357
↩
Hale, J. (2003). The information conveyed by words in sentences. Journal of Psycholinguistic Research, 32(2), 101–123. https://doi.org/10.1023/A:1022492123056
↩
Harrington Stack, C. M., James, A. N., & Watson, D. G. (2018). A failure to replicate rapid syntactic adaptation in comprehension. Memory & Cognition, 46(6), 864–877. https://doi.org/10.3758/s13421-018-0808-6
↩
Henrich, J., Heine, S. J., & Norenzayan, A. (2010). The weirdest people in the world? Behavioral and Brain Sciences, 33(2-3), 61–83. https://doi.org/10.1017/S0140525X0999152X
↩
Hickok, G., & Poeppel, D. (2007). The cortical organization of speech processing. Nature Reviews Neuroscience, 8(5), 393–402. https://doi.org/10.1038/nrn2113
↩
Holler, J., & Levinson, S. C. (2019). Multimodal language processing in human communication. Trends in Cognitive Sciences, 23(8), 639–652.
↩
Hopp, H. (2022). Second language sentence processing. Annual Review of Linguistics, 8(1), 235–256. https://doi.org/10.1146/annurev-linguistics-030821-054113
↩
Huang, K. J., Arehalli, S., Kugemoto, M., Muxica, C., Prasad, G., Dillon, B., & Linzen, T. (2024). Large-scale benchmark yields no evidence that language model surprisal explains syntactic disambiguation difficulty. Journal of Memory and Language, 137, 104510. https://doi.org/10.1016/j.jml.2024.104510
↩
James, A. N., Fraundorf, S. H., Lee, E. K., & Watson, D. G. (2018). Individual differences in syntactic processing: Is there evidence for reader-text interactions? Journal of Memory and Language, 102, 155–181. https://doi.org/10.1016/j.jml.2018.05.006
↩
Kaan, E., & Chun, E. (2018). Priming and adaptation in native speakers and second-language learners. Bilingualism: Language and Cognition, 21(2), 228–242. https://doi.org/10.1017/S1366728916001231
↩
Kaan, E., & Grüter, T. (Eds.). (2021). Prediction in second language processing and learning. John Benjamins Publishing Company. https://doi.org/10.1075/bpa.12
↩
Kaufeld, G., Bosker, H. R., Ten Oever, S., Alday, P. M., Meyer, A. S., & Martin, A. E. (2020). Linguistic structure and meaning organize neural oscillations into a content-specific hierarchy. Journal of Neuroscience, 40(49), 9467–9475. https://doi.org/10.1523/JNEUROSCI.0302-20.2020
↩
Kidd, E., & Garcia, R. (2022). How diverse is child language acquisition research?. First Language, 42(6), 703-735.
↩
Kimball, J. (1973). Seven principles of surface structure parsing in natural language. Cognition, 2(1), 15–47. https://doi.org/10.1016/0010-0277(72)90028-5
↩
Krauska, A., & Lau, E. (2023). Moving away from lexicalism in psycho- and neuro-linguistics. Frontiers in Language Sciences, 2, 1125127. https://doi.org/10.3389/flang.2023.1125127
↩
Levy, R. (2008). Expectation-based syntactic comprehension. Cognition, 106(3), 1126–1177. https://doi.org/10.1016/j.cognition.2007.05.006
↩
Lewis, R. L., & Vasishth, S. (2005). An activation-based model of sentence processing as skilled memory retrieval. Cognitive Science, 29(3), 375–419. https://doi.org/10.1207/s15516709cog0000_25
↩
MacDonald, M. C., Pearlmutter, N. J., & Seidenberg, M. S. (1994). The lexical nature of syntactic ambiguity resolution. Psychological Review, 101(4), 676–703. https://doi.org/10.1037/0033-295X.101.4.676
↩
Manning, C. D., Clark, K., Hewitt, J., Khandelwal, U., & Levy, O. (2020). Emergent linguistic structure in artificial neural networks trained by self-supervision. Proceedings of the National Academy of Sciences, 117(48), 30046–30054. https://doi.org/10.1073/pnas.1907367117
↩
Marcus, M. P. (1980). A theory of syntactic recognition for natural language. MIT Press.
↩
Marinis, T. (2024). Syntactic processing in developmental and acquired language disorders. In M. J. Ball, N. Müller, & E. Spencer (Eds.), The handbook of clinical linguistics (2nd ed., pp. 189–199). https://doi.org/10.1002/9781119875949.ch14
↩
Marr, D. (1982). Vision: A computational investigation into the human representation and processing of visual information. W. H. Freeman.
↩
Matchin, W., & Hickok, G. (2020). The cortical organization of syntax. Cerebral Cortex, 30(3), 1481–1498. https://doi.org/10.1093/cercor/bhz180
↩
McElree, B. (2000). Sentence comprehension is mediated by content-addressable memory structures. Journal of Psycholinguistic Research, 29(2), 111–123. https://doi.org/10.1023/A:1005184709695
↩
McElree, B. (2006). Accessing recent events. In B. H. Ross (Ed.), The psychology of learning and motivation (Vol. 46, pp. 155–200). Academic Press.
↩
Miller, G. A., & Chomsky, N. (1963). Finitary models of language users. In R. D. Luce, R. Bush, & E. Galanter (Eds.), Handbook of mathematical psychology (Vol. 2, pp. 419–491). Wiley.
↩
Nairne, J. S. (2006). Modeling distinctiveness: Implications for general memory theory. In R. R. Hunt & J. Worthen (Eds.), Distinctiveness and memory (pp. 26–46). Oxford University Press. https://doi.org/10.1093/acprof:oso/9780195169669.003.0002
↩
Omaki, A., & Lidz, J. (2015). Linking parser development to acquisition of syntactic knowledge. Language Acquisition, 22(2), 158–192. https://doi.org/10.1080/10489223.2014.943903
↩
Paape, D., Vasishth, S., Paape, D., & Vasishth, S. (2022). Is reanalysis selective when regressions are consciously controlled? Glossa Psycholinguistics, 1(1), 39. https://doi.org/10.5070/G601139
↩
Phillips, C., Wagers, M. W., & Lau, E. F. (2011). Grammatical illusions and selective fallibility in real-time language comprehension. In J. Runner (Ed.), Experiments at the Interfaces (pp. 147–180). Brill. https://doi.org/10.1163/9781780523750_006
↩
Pickering, M. J., McLean, J. F., & Branigan, H. P. (2013). Persistent structural priming and frequency effects during comprehension. Journal of Experimental Psychology: Learning, Memory, and Cognition, 39(3), 890–897. https://doi.org/10.1037/a0029181
↩
Pizarro-Guevara, J. S., & Garcia, R. (2024). Philippine Psycholinguistics. Annual Review of Linguistics, 10(1), 145–167. https://doi.org/10.1146/annurev-linguistics-031522-102844
↩
Poeppel, D., Emmorey, K., Hickok, G., & Pylkkänen, L. (2012). Towards a new neurobiology of language. Journal of Neuroscience, 32(41), 14125–14131. https://doi.org/10.1523/JNEUROSCI.3244-12.2012
↩
Pylkkänen, L. (2019). The neural basis of combinatory syntax and semantics. Science, 366(6461), 62–66. https://doi.org/10.1126/science.aax0050
↩
Sauppe, S., Andrews, C., & Norcliffe, E. (2023). Experimental research in cross-linguistic psycholinguistics. In S. Zufferey & P. Gygax (Eds.), The Routledge handbook of experimental linguistics (pp. 156–172). Routledge. https://doi.org/10.4324/9781003392972-12
↩
Sedivy, J. C., Tanenhaus, M. K., Chambers, C. G., & Carlson, G. N. (1999). Achieving incremental semantic interpretation through contextual representation. Cognition, 71(2), 109–147. https://doi.org/10.1016/S0010-0277(99)00025-6
↩
Smith, G., Franck, J., & Tabor, W. (2021). Encoding interference effects support self-organized sentence processing. Cognitive Psychology, 124, 101356. https://doi.org/10.1016/j.cogpsych.2020.101356
↩
Snedeker, J., & Trueswell, J. C. (2004). The developing constraints on parsing decisions: The role of lexical-biases and referential scenes in child and adult sentence processing. Cognitive Psychology, 49(3), 238–299. https://doi.org/10.1016/j.cogpsych.2004.03.001
↩
Staub, A., & Clifton, C., Jr. (2006). Syntactic prediction in language comprehension: Evidence from either...or. Journal of Experimental Psychology: Learning, Memory, and Cognition, 32(2), 425–436. https://doi.org/10.1037/0278-7393.32.2.425
↩
Stowe, L. A. (1986). Parsing WH-constructions: Evidence for on-line gap location. Language and Cognitive Processes, 1(3), 227–245. https://doi.org/10.1080/01690968608407062
↩
Tabor, W., Galantucci, B., & Richardson, D. (2004). Effects of merely local syntactic coherence on sentence processing. Journal of Memory and Language, 50(4), 355–370. https://doi.org/10.1016/j.jml.2004.01.001
↩
Tanenhaus, M. K., Spivey-Knowlton, M. J., Eberhard, K. M., & Sedivy, J. C. (1995). Integration of visual and linguistic information in spoken language comprehension. Science, 268(5217), 1632–1634. https://doi.org/10.1126/science.7777863
↩
Thothathiri, M., & Snedeker, J. (2008). Give and take: Syntactic priming during spoken language comprehension. Cognition, 108(1), 51–68. https://doi.org/10.1016/j.cognition.2007.12.012
↩
Tooley, K. M., & Traxler, M. J. (2018). Implicit learning of structure occurs in parallel with lexically-mediated syntactic priming effects in sentence comprehension. Journal of Memory and Language, 98, 59–76. https://doi.org/10.1016/j.jml.2017.09.004
↩
Van Dyke, J. A., & Lewis, R. L. (2003). Distinguishing effects of structure and decay on attachment and repair: A cue-based parsing account of recovery from misanalyzed ambiguities. Journal of Memory and Language, 49(3), 285–316. https://doi.org/10.1016/S0749-596X(03)00081-0
↩
Van Dyke, J. A., Johns, C. L., & Kukona, A. (2014). Low working memory capacity is only spuriously related to poor reading comprehension. Cognition, 131(3), 373–403. https://doi.org/10.1016/j.cognition.2014.01.007
↩
van Schijndel, M., & Linzen, T. (2021). Single‐stage prediction models do not explain the magnitude of syntactic disambiguation difficulty. Cognitive Science, 45(6), e12988. https://doi.org/10.1111/cogs.12988
↩
Wagers, M. W., Lau, E. F., & Phillips, C. (2009). Agreement attraction in comprehension: Representations and processes. Journal of Memory and Language, 61(2), 206–237. https://doi.org/10.1016/j.jml.2009.04.002
↩
Yadav, H., Smith, G., Reich, S., & Vasishth, S. (2023). Number feature distortion modulates cue-based retrieval in reading. Journal of Memory and Language, 129, 104400. https://doi.org/10.1016/j.jml.2022.104400
↩

Sentence Processing