Language, arguably the oldest human technology, is also by far the most critical to the species and the ongoing evolution of technological progress: every significant development in the technology of language (speech, writing, printing, code) has launched a new era of human civilization. But for such a foundational, robust, and ancient piece of technology, language is highly limiting and flawed as a vehicle for communication.

In order to communicate, language users must bind phenomenological experience to symbols through demonstration or suggestion. Within a large community of language users, the semantic content of such symbols thus becomes compoundingly abstracted, and it’s only these compounded abstractions which can effectively serve as the intersubjective content on which human communication and collaboration can operate. This constrains our ability to effectively engage with irreducibly subjective experiences, radically inhuman phenomena, or anything else that falls outside of the average distribution of collective human experience. Furthermore, because the relationship between the semantic content of language and underlying phenomena is symbolic rather than material, it’s possible to use language to communicate falsehoods, whether intentionally or unintentionally. While this flexibility can be a strength, there are self-evident downsides. The nature of the phenomena underlying language also frequently change, and there can be a more or less significant lag between such changes and the use and semantic content of referent language in a community of language users. In addition, natural language is constrained by its sequential nature, and is non-ideal for representations of multimodal information like images, smells, or high-dimensional tensors. Because of all this, semantic content communicated by natural language is vague, inefficient, and difficult to verify.

While formal languages, such as mathematics, logic, and code, are designed to avoid some of these problems, the subject of which these formal languages communicate can only be other formal, symbolic truths. While we can compensate for this shortcoming through the incorporation of verified data and reliance on symbolic decision-making rules, these methods only return us to the root issues in natural language—there is no law of physics that prevents the mathematical consistency of logic from binding to semantic content in such a way that it arrives at false conclusions. Additionally, there are certain statements which even formal symbolic languages are incapable of expressing, such as statements expressing the truth value of paradoxes (e.g., “I am lying right now,” or “The set of all sets that do not contain themselves contains itself”). Assuming physicalism, this divergence between the space of all possible linguistic worlds and the space of all possible physical worlds should make us concerned about the fidelity of language as a representational medium.

Over the course of our species our use of language has transformed and evolved, moving from iconic visual communication, to indexical gestural communication, to symbolic natural languages, to formal programmable languages. We believe that we haven’t yet reached the upper limit on utility for the media through which we communicate and represent the world.


In recent years, a growing field of artificial intelligence research, referred to as “emergent communication,” has been developing frameworks for evolving novel linguistic communication protocols from scratch through reinforcement learning simulations. In these experiments, RL agents learn to communicate with each other in order to coordinate and achieve shared goals. For example, in some settings agents will develop a language through which they successfully refer to and identify various images, while in others they learn to give each other directions to navigate an environment. It’s important to note that when training begins, agents don’t have any sense of what language is—they simply have access to a channel through which they can send symbols to each other. For thousands of training cycles, their use of these channels is entirely random. Slowly however, the message passing begins to converge towards consistent, interpretable communication protocols that both agents can rely on to understand their environment and inform their actions.

In the early 14th century, the Spanish philosopher and theologian Ramon Lull developed the Ars Generalis Ultima, a proposal for a universal logic which introduced the idea that thinking is a computational process involving the combination of symbols, and that this process could be mechanically automated. Several hundred years later, Lull’s system inspired Gottfried Leibniz to develop the Characteristica Universalis, a universal language for science in which the form of each character would be logically bound to its meaning. Such a language would transparently expose the underlying nature of the world, allowing the logical combination of symbols to become the means towards genuine discovery. But Leibniz ultimately recognized his project as impractical—not only was there no perfect way to map physics to symbols, but having similar symbols refer to similar phenomena makes a language impossible to remember and understand. “Ladder” and “lather” can be formally similar precisely because they’re semantically dissimilar: their contexts help us disambiguate their meanings even in the presence of noise.

In the 20th century, a series of artificially designed languages flourished in the hopes of facilitating international trade and diplomacy, and the logical positivism of Carnap, Frege, Russell and Whitehead strove to develop an unshakeable logical foundation on which to develop all of linguistic and scientific thought. Today however, Esperanto and Loglan are spoken only by hobbyists, and logical positivism ultimately regrouped to a much more modest position after it’s basic aims were proven impossible by Gödel’s incompleteness theorem and Tarski’s undefinability theorem, as well as the critiques of one of its original progenitors, Wittgenstein. Wittgenstein’s first book, the Tractatus Logico-Philosophicus argued that the purpose of philosophy is to analyze language and distinguish between intelligible and nonsensical discourse. In the Philosophical Investigations, published 25 years later, Wittgenstein presented an explicitly contrary position: language is not a representation of the world, but an action. He described language as a series of games in which meaning arises and fades away depending on context—independent of use, meaning does not exist.


The first paper to demonstrate the emergence of machine-to-machine language ex novo through reinforcement learning was Jakob Foerster et al., 2016, which features agents who manage to develop communication protocols that enable optimal solutions for two well-known and challenging riddles (the hats riddle and the switch riddle) in which prisoners must develop a communication code that will allow them to escape execution. Within the next year, two additional papers were published which greatly expanded on Foerster’s research: Sukhbaatar et al., 2016, and Lazaridou et al., 2017. These respectively demonstrated the emergence of language to (1) facilitate navigation in various cooperative and competitive grid-world scenarios, and (2) correctly identifying images in Lewis-Style referential games. Since these initial papers, a growing number of researchers have begun exploring the possibilities of emergent communication, with investigation oriented around several interrelated threads.

Firstly, a number of researchers have sought to use emergent communication as a tool to study the emergence, evolution, and structure of natural language in an extension of the tradition of computational linguistics. Other notable works in this vein examined the emergence of language structure through evolutionary methods (Grouchy et al., 2016), but RL-based emergent communication offers a more robust toolset than previous approaches. An illustrative example is Graesser et al., 2020, in which the authors train distinct populations of agents to communicate with each other and then measure how the various agent languages change as these isolated linguistic communities come into contact with each other. The authors discover the emergence of dialects and pidgin languages between agents from different communities, as well as linguistic continua across adjacent language communities, and they show how languages become more compositional as they are spoken by more diverse populations. In this way they suggest a computational framework for contact linguistics—a subject that can’t otherwise be observed under controlled experimentation.

This linguistics-driven approach has prompted research on the design of learning environments and agent constraints to push emergent languages towards more human-like qualities. The most sought-out feature has been compositionality (Havrylov et al., 2017, Mordatch et al., 2018, Chaabouni et al., 2020)—the capacity for expressions to be combined or separated to refer to novel concepts—primarily because it often (but not always) partially entails the other most relevant features of interest, such as generalization (Tucker et al., 2022, Mu et al., 2021, Ohmer et al., 2022, Hagiwara et al., 2021), learnability (Graesser et al., 2020, Li et al., 2019, Kharitonov et al., 2020), robustness to noise (Khartinov et al., 2020, Tung et al., 2021), concision (Chaabouni et al., 2019, Kalinowska et al., 2022), logical structure (Mu et al., 2021, Ohmer et al., 2022), and human interpretability (Kottur et al., 2017, Dessi et al., 2021, Tucker et al., 2021, Karten et al., 2023).