Common ground is the sum of the knowledge, beliefs, perceptions, and assumptions that a set of people believe they share. For people to do things together, they must act on information in their common ground. Two people playing tennis must assume they share the rules of the game, the ability to serve and volley, a perception of their physical surroundings, and a commitment to playing the current game. People in conversation need analogous information. They must assume they share a language (e.g., English or Japanese), the ability to speak and listen, a perception of their physical surroundings, and a commitment to participating in the current conversation. Joint activities are impossible without the right common ground. While common ground is ubiquitous in human conversation, the notion raises important issues for fiction and for artificial agents.

History

Concepts akin to common ground have been around for a long time, but most were informal. In one of the first formal analyses, Thomas Schelling (1960) examined what people do in coordination games. Example: “Name ‘heads’ or ‘tails.’ If you and your partner do the same thing, you both win a prize. If you don’t, neither of you wins.” To win, according to Schelling, players “must ‘mutually recognize’ some unique signal that coordinates their expectations of each other.” He called these mutually recognized signals focal points [see Signaling].

David Lewis (1969) drew on Schelling’s coordination games in formalizing the notion of common knowledge. Suppose two people agree to meet at a particular place and time. Their agreement, for Lewis, is a coordination device. It provides the two people with a “basis” for taking when and where they expect to meet to be common knowledge. Lewis went on to argue that conventions are a type of coordination device and that languages are based on a system of conventions. It is conventional, for example, that a “canine animal” is “dog” in English, “chien” in French, and “hond” in Dutch. Communicating in English, French, and Dutch is impossible without appeal to their distinct systems of conventions.

Robert Stalnaker (1978), in turn, drew on Lewis’s analyses in introducing common ground as a feature of everyday conversation. “Roughly speaking,” Stalnaker argued, “the presuppositions of a speaker … are what is taken by the speaker to be the common ground of the participants in the conversation, what is treated as their common knowledge or mutual knowledge” (p. 321). In conversation, each new utterance updates the participants’ common ground. Later, Stalnaker (2002, 2014) enlarged the notion of common ground to include other shared attitudes.

Core concepts

Common ground is a special type of shared information. Suppose Ken and Maggy are in London, and one morning they agree to meet at the British Museum at noon. It is not enough for them each to believe they expect to meet at the British Museum at noon. They must each take it that:

(x) the two of them each believe (a) that they each expect to meet at the British Museum at noon and (b) that statement x is true.

Part a represents Ken and Maggy’s individual beliefs about meeting at noon. Although those beliefs happen to be the same, that is not enough. It takes part b for Ken and Maggy to turn their individual beliefs into a common, or mutual, belief. Part b, in effect, is a bird’s-eye view, within a single framework, of the two of them and their individual beliefs. Ken and Maggy establish other information as common knowledge, common assumptions, or common perceptions. They do that when “believe” in this schema is replaced with “know,” “assume,” or “perceive.” So, the common ground of a set of people is the sum of their common, or mutual, knowledge, beliefs, assumptions, and perceptions.

Common ground is a property not of individuals but of sets of individuals (Lewis, 1969). These sets come in two main types (Clark, 1996; see Clark & Marshall, 1981). One type consists of individuals, such as Ken and Maggy, who have experienced certain events together. The other type consists of individuals, such as French Canadians or dentists, who merely belong to the same cultural community. The common ground possessed by the two types are fundamentally different.

Communal common ground

Cultural communities are sets of people identified by the cultural practices and expertise they share. Pediatricians are one such community. Once two people mutually believe they are both pediatricians, they have a basis (in Lewis’s sense) for considering the pediatric practices and expertise they share as common ground. This is communal common ground.

Cultural communities are a rich source of common ground because everyone belongs to many of them (Clark, 1996, 1998). Suppose Julia belongs to these: English speakers, US citizens, California residents, MIT graduates, electrical engineers, IBM employees, Democrats, Lutherans, 30-year-olds, women, and ski buffs. Julia can assume that she shares, to varying degrees, the cultural practices and expertise of each of these communities with other members of those communities.

People have many ways of establishing that they share membership in a cultural community. One way is with introductions (Svennevig, 2000). Ken: “Julia, this is Desmond from Singapore. Like you, he loves to ski.” Julia: “So, Desmond, where do you ski?” Another way is with probes (Schegloff, 1972, 2000). In London, Ken can stop a stranger and ask, “Are you from around here?” and, if the answer is yes, ask for directions to the British Museum. In Paris, he can ask a passerby, “Do you speak English?” and then ask for directions to the Louvre. Still another way is with perceptual evidence. On meeting Julia, Desmond might note her accent, gender, age, and clothing and infer that she is North American, a woman, a 30-year-old, and a skier—a first step in becoming acquainted. And there is serendipity. Visiting the home of a new colleague, I noticed a book of Mozart sonatas on his piano. Once I told him I had played some of them, our conversation took off.

Personal common ground

The bases for personal common ground, in contrast, are joint personal experiences. These are events that people jointly experience, typically with joint attention, in social settings such as eating together, playing tennis, or walking together. Still, the paradigmatic setting for personal common ground is conversation. People talking to each other use their speech and gestures to update their current common ground (Stalnaker, 2014).

It is a mistake to ignore the differences between personal and communal common ground. Personal common ground is based on biographical experiences with specific people at specific places and times. It is information particular to those people, places, and times. Communal common ground, in contrast, is based on the expertise and practices that people acquire as members of cultural communities. The information is generic and not tied to specific people at specific places and times. It is shared with huge collections of people, most of whom one has never met and is never likely to meet.

Memory and common ground

Just as there are two types of common ground, research has long shown that there are two major systems of memory (Tulving, 1972). Episodic, or autobiographical, memory represents events that have happened to someone personally. Semantic, or generic, memory represents a person’s general knowledge of the world. Clearly, personal common ground is based primarily on episodic memory, and communal common ground is based on generic memory [see Memory].

Episodic memory, for example, is needed for so-called deictic expressions. When Ken tells Maggy, “The British Museum is over there,” he must point in the right direction for her to identify the building he is referring to. Deictic expressions like “over there” index elements in the speaker and addressee’s joint experiences (Fillmore, 1997). The elements they index may be locations in space (as with here, there, this, that, over there, upstairs, and the other side), locations in time (as with now, yesterday, today, tomorrow, and the other day) or people (as with I, you, we, and you guys). Expressions like these cannot be used or understood without information in episodic memory—in personal common ground.

Generic memory, on the other hand, is needed for the conventional words of a cultural community (Lewis, 1969). A communal lexicon is the set of conventional words that are based on the expertise and practices of a cultural community (Clark, 1998). In English, medical doctors take “sclerotic,” “aorta,” “myocardial,” and “infarction” to be part of their communal common ground just as San Franciscans take “Russian Hill,” “Lombard,” “Noe Valley,” and “Crissy Field” to be part of theirs. For people who speak English, a major cultural community is simply “English speakers.” People in this community take for granted that other English speakers know most of the ordinary words they know. In this view, all conventional words are based on information in people’s generic memory—in their communal common ground.

Grounding

In many accounts of common ground (e.g., Stalnaker, 2014), people are treated as ideal agents. Speakers are assumed to know exactly what they want to say and to speak without errors, and listeners are assumed to be correct in understanding what they hear. In reality, speakers are provisional in what they say, hesitating, making mistakes, and changing their minds as they speak. And listeners often mishear and misinterpret what is said. Because of these problems, people in conversation collaborate with each other to reach joint closure on what is said.

The process is called grounding (Brennan et al., 2010; Clark, 1996; Clark & Brennan, 1991; Clark & Schaefer, 1989; Clark & Wilkes-Gibbs, 1986). Consider this actual exchange:

Roger: Now [pause], um, do you and your husband have a j– car?
Nina: [pause] Have a car?
Roger: Yeah.
Nina: No.

Nina is unsure of Roger’s phrase “have a car,” so she queries it, and Roger answers “yeah.” Only then does Nina take his entire question to be in their common ground and answer it. Likewise, Roger does not take his question to be in their common ground until Nina answers it in turn four. The process of grounding relies on both positive and negative evidence. As positive evidence, addressees produce “yeah,” “uh-huh,” and other so-called back-channel responses, and as negative evidence, they ask questions like “What?” or “Whose husband?” or “Have a car?” People in conversation nevertheless realize that grounding is never perfect. They assume that to ground a thing is to establish it as part of common ground well enough for current purposes (Clark, 1996).

Grounding, however, is impossible in many settings. On the radio, on television, and in newspapers, reporters must make their reports comprehensible to their audiences without feedback. To do that, they must imagine the class of people they are reporting to and infer, as best they can, the communal common ground they share with them. That often requires subtle judgments.

Many audiences are also diverse. When chemistry professors lecture to a class of students, they must decide who to aim for—those who are most prepared, those who are least prepared, or both—and if so, how. In criminal trials, judges must tailor their language for everyone present—from the legal experts (e.g., attorneys) to the legal novices (e.g., jurors and spectators). People gossiping on buses must take account of potential eavesdroppers (Clark & Schaefer, 1987) and so must spies passing on secrets in the presence of others. Dealing with diverse audiences is a challenge.

Questions, controversies, and new developments

An issue for many scholars is how to represent common ground formally, and they have suggested several formulations. In some of these, common ground is represented in relation to the evidence it is based on. In others, it is represented independent of that evidence. Here are three formulations (Clark, 1996, chapter 4).

Common ground (CG)-iterated consists of iterated propositions. Suppose Ken and Maggy are the sole members of set S, and p is the proposition, “The members of set S expect to meet at the British Museum at noon.” In the following, “having information” includes knowing, believing, assuming, perceiving, and other attitudes.

Proposition p is common ground for the members of set S if and only if

  1. they each have information that p.

  2. they each have information that they each have information that p.

  3. they each have information that they each have information that they each have information that p.

  4. they each have information that they each have information that they each have information that they each have information that p.

And so on ad infinitum.

For many theorists (e.g., Aumann, 1976; Schiffer, 1972; Stalnaker, 2014), CG-iterated is the only proper representation of common ground.

CG-iterated, however, is manifestly impossible as a representation of common ground in mental processes (Clark & Marshall, 1981). There is no way Ken and Maggy could consider or verify the infinite number of statements needed for meeting at the British Museum. To get around this problem, many theorists assume that people store or verify only the first two, three, or four lines of CG-iterated, but that is not much help. Line one expands into two propositions (“Ken has information that p,” and “Maggy has information that p”), and lines two, three, and four expand into four, eight, and 16 propositions for a total of 30 propositions. This total sky rockets when Ken and Maggy are joined by even one or two friends. When set S has three members, the total becomes 120, and when it has four, the total becomes 340. When it has 10 members—imagine nine tourists and a guide—the total is over 11,000. These totals are clearly implausible for actual mental processes.

CG-shared-basis (Lewis, 1969) represents common ground in relation to a basis b. Suppose set S is Ken and Maggy, p is their expectation of meeting at the British Museum, and their expectation is based on their verbal agreement to meet, basis b. The representation for common ground is this:

Proposition p is common ground for the members of set S if and only if

  1. they each have information that basis b holds.

  2. basis b indicates to each of them that they each have information that basis b holds.

  3. basis b indicates to each of them that p.

Unlike CG-iterated, this formulation is finite, and basis b itself can vary in the strength of its evidence (Lewis, 1969). So, CG-shared-basis is not implausible as a mental representation of common ground (see Barwise, 2016; Clark, 1996).

CG-reflexive is CG-shared-basis stripped of any reference to the evidence on which it is based (Clark & Marshall, 1981; Cohen, 1978; Harman, 1977):

Proposition p is common ground for the members of set S if and only if

(x) each of the members has information that p and that x.

This representation is reflexive; the proposition labeled x contains a reference to itself (just as “This sentence contains five words” contains a reference to itself). It is simply a more general version of the Ken–Maggy example. Some theorists object to reflexive representations in principle, but others have shown that reflexive representations are perfectly legitimate for many purposes (Barwise, 2016; Barwise & Etchemendy, 1987)

Despite their differences, the three representations are related. CG-shared-basis reduces to CG-reflexive with the removal of basis b. And with the right assumptions, CG-reflexive can be expanded into as many lines of CG-iterated as needed. The three representations are each useful but for different purposes (see Barwise, 2016).

Broader connections

The concept of common ground bears on issues in many areas of scholarship. Two of these areas are fiction and artificial agents.

One issue is how to track common ground in fiction. Consider Shakespeare’s Hamlet. In act three scene four, Hamlet is talking with his mother Gertrude. When Hamlet hears muttering behind a nearby wall hanging, he suddenly draws his sword and thrusts it through the wall hanging, killing the man behind it. The man happens to be Polonius, who Gertrude had placed there to eavesdrop on their conversation. The audience at a performance of Hamlet has no trouble grasping what happened in that scene and why.

However, what that audience does is remarkable. Among other things, the audience infers: Gertrude knew Polonius was there, but Hamlet did not, and Gertrude knew Hamlet did not know; thanks to the dramatist, the audience knows what all three of these characters knew. That is, the audience infers distinct bodies of common ground for (a) Gertrude and Hamlet, (b) Gertrude and Polonius, (c) Hamlet and Polonius, and (d) the dramatist and his projected audience. To understand the scene, the audience had to track these four sets of common ground and the disparities among them. (The audience also shares common ground with the three actors on stage, adding yet another layer to the analysis.) Shakespeare created disparities like these in all of his plays (e.g., with prologues, asides, soliloquys, and cross-dressing). The question is how audiences keep track of them all.

What is true of Shakespeare is true of fiction in general. Consider Melville’s Moby Dick. In the first sentence (“Call me Ishmael”), the apparent author introduces readers to Ishmael, who proceeds to narrate a series of adventures in the first person. From the very start, readers assume that the apparent author takes vast areas of generic 19th century knowledge as communal common ground with his 19th century readers and that Ishmael the narrator takes much of the same information as communal common ground with the “landsmen” he is speaking to. In the course of the novel, readers see how Ishmael the character creates distinct bodies of personal common ground with Queequeg, Starbuck, Stubb, Ahab, and other characters. Without tracking all these bodies of common ground, readers would be lost.

A further issue is how people interact with artificial agents [see Large Language Models]. Communication with them is often one way. People hear announcements in elevators (“Third floor” and “Door opening”), airports (“Welcome to Denver”), and even bathroom scales (“165 pounds”), and they comply with prerecorded requests in the London Underground (“Please stand clear of the closing doors”) and in cars (“Turn left in 100 yards”). In other cases, communication is two way. People take turns speaking with the virtual agents Siri and Alexa and with prerecorded voices in telephone calls to drug stores and airlines. The medium may be speech or print.

People tacitly know, however, that artificial agents are not real agents no matter how realistic they seem. They take it that artificial agents represent or depict fictional characters and that these characters are unlike humans in specific ways. They are nonstandard agents.

One way they are nonstandard is in memory. Human-like chatbots lack human-like biographies: they lack mothers, fathers, siblings, childhood experiences, hometowns, college majors and degrees, hobbies, sports, and other distinctive information. Without biographical memories like these, they are limited in the personal common ground they can establish with others. Human-like chatbots also lack membership in cultural communities. They are never identified as MIT graduates, IBM employees, electrical engineers, Lutherans, women, ski buffs, or Red Sox fans. As a result, their generic memory—their general knowledge of the world—is not organized by cultural communities. The only community that English-speaking chatbots share with humans is “English speakers,” and that is limiting. In talking to a chatbot, nuclear physicists will wonder whether they can discuss Higgs boson, leptons, and the flavors of quarks as they would with fellow physicists.

The strategy people appear to adopt with artificial agents is this (Clark & Fischer, 2023): people act on the pretense that the artificial agent they are communicating with is a real agent but of a nonstandard type, and they try to establish personal and communal common ground with the agent consistent with that pretense.

Further reading

  • Clark, H. H. (1996). Using language. Cambridge University Press.

  • Lewis, D. (1969). Convention: A philosophical study. Harvard University Press.

  • Stalnaker, R. (2014). Context. Oxford University Press.

References

  • Aumann, R. J. (1976). Agreeing to disagree. Annals of Statistics, 4(6), 1236-1239. https://doi.org/10.1214/aos/1176343654

  • Barwise, J. (2016). Three views of common knowledge. In H. Arló-Costa, V. F. Hendricks, & J. Van Benthem (Eds.), Readings in formal epistemology: Sourcebook (pp. 759-772). Springer.

  • Barwise, J., & Etchemendy, J. (1987). The liar: An essay on truth and circularity. Oxford University Press.

  • Brennan, S. E., Galati, A., & Kuhlen, A. K. (2010). Two minds, one dialog: Coordinating speaking and understanding. In B. H. Ross (Ed.), Psychology of learning and motivation (Vol. 53, pp. 301-344). Elsevier.

  • Clark, H. H. (1996). Using language. Cambridge University Press.

  • Clark, H. H. (1998). Communal lexicons. In K. Malmkjaer & J. Williams (Eds.), Context in language learning and language understanding (pp. 63-87). Cambridge University Press.

  • Clark, H. H., & Brennan, S. E. (1991). Grounding in communication. In L. B. Resnick, J. M. Levine, & S. D. Teasley (Eds.), Perspectives on socially shared cognition (pp. 127-149). American Psychological Association.

  • Clark, H. H., & Fischer, K. (2023) Social robots as depictions of social agents. Behavioral and Brain Sciences, 46, e21. https://doi.org/10.1017/S0140525X22000668

  • Clark, H. H., & Marshall, C. R. (1981). Definite reference and mutual knowledge. In A. K. Joshi, B. Webber, & I. A. Sag (Eds.), Elements of discourse understanding (pp. 10-63). Cambridge University Press.

  • Clark, H. H., & Schaefer, E. F. (1987). Collaborating on contributions to conversations. Language and Cognitive Processes, 2(1), 19–41. https://doi.org/10.1080/01690968708406350

  • Clark, H. H., & Schaefer, E. F. (1989). Contributing to discourse. Cognitive Science, 13(2), 259-294. https://doi.org/10.1016/0364-0213(89)90008-6

  • Clark, H. H., & Wilkes-Gibbs, D. (1986). Referring as a collaborative process. Cognition, 22(1), 1-39. https://doi.org/10.1016/0010-0277(86)90010-7

  • Cohen, P. R. (1978). On knowing what to say: Planning speech acts [Doctoral dissertation, University of Toronto].

  • Fillmore, C. (1997). Lectures on deixis. University of Chicago Press.

  • Harman, G. (1977). Review of Linguistic behaviour by Jonathan Bennett. Language, 53(2), 417-424. https://doi.org/10.2307/413111

  • Lewis, D. (1969). Convention: A philosophical study. Harvard University Press.

  • Schegloff, E. A. (1972). Notes on a conversational practice: Formulating place. In D. N. Sudnow (Ed.), Studies in social interaction (pp. 75-119). Free Press.

  • Schegloff, E. A. (2000). On granularity. Annual Review of Sociology, 26, 715-720. https://doi.org/10.1146/annurev.soc.26.1.715

  • Schelling, T. C. (1960). The strategy of conflict. Harvard University Press.

  • Schiffer, S. R. (1972). Meaning. Clarendon Press.

  • Stalnaker, R. (2002). Common ground. Linguistics and Philosophy, 25(5/6), 701-721. https://doi.org/10.1023/A:1020867916902

  • Stalnaker, R. (2014). Context. Oxford University Press.

  • Stalnaker, R. C. (1978). Assertion. In P. Cole (Ed.), Syntax and semantics 9: Pragmatics (pp. 315-332). Academic Press.

  • Svennevig, J. (2000). Getting acquainted in conversation: A study of initial interactions. John Benjamins.

  • Tulving, E. (1972). Episodic and semantic memory. In E. Tulving & W. Donaldson (Eds.), Organization of memory. Academic Press.