As I write this in May 2023 we are seeing the emergence of AI with human-level IQ and conversational abilities. I believe the disruptive change of the next few years will greatly challenge us all. The initial challenge that many have when encountering generative AI models like ChatGPT or Midjourney is grappling with the idea that these systems can actually think and be creative in ways very similar to how people do. In fact, they emulate our cognition so closely that many techniques derived from psychology and cognitive science directly apply to AI models. For example, prompting - the technique of telling an AI information to bring that information into its attentional window - is a classic cognitive technique widely used in the influence professions, such as advertising.
The notion that an algorithm running on a computer can hold a meaningful conversation with us or produce emotionally impactful art disconcerts many. This initial retrenchment is often encapsulated in the idea that large language models (LLMs) are “stochastic parrots” and sometimes in its more sophisticated variant, the “Chinese Room.” These are essentially philosophical arguments that a system can appear to be intelligent but really simply be symbolically manipulating language with no understanding of its meaning. Comforting as they might be to some, these ideas are simply not correct. In this article I will explain why, whether generative AIs think (short answer: yes), and what is wrong with the Chinese Room idea. Along the way we will gain insight into the power of power of language in human cognition.
Scientists from Microsoft Research who evaluated a pre-release version of OpenAI’s ChatGPT4 wrote a paper entitled “Sparks of Artificial General Intelligence: Early experiments with GPT-4.” In the accompanying lecture, researcher Sebastien Bubeck candidly tells the audience they need to abandon the notion that LLMs are simply following linguistic patterns without any understanding. Understanding is testable, he points out. Cognitive scientists have an array of tests they use to evaluate humans: theory of mind tests that evaluate how well the test subject understands the viewpoint of others, intelligence tests, tests of common sense reasoning, tests of general and specific knowledge, tests that evaluate knowledge of real-world physical situations, and others. Since LLMs can engage in dialogue, researchers can simply ask them questions to test these properties.
These tests have been given to LLMs, charting their cognitive evolution through recent generations. The best LLMs now perform at or beyond most human levels even at tasks, like understanding how one person’s comments affect another person’s feelings, that we consider quintessentially human. You can follow AI cognitive testing at Dr. Alan D. Thompson’s archive. We know LLMs understand situations because we can pose probing questions about those situations and receive answers that necessitate having an internal model of the situation. Hence, LLM AIs are not “stochastic parrots.”
The “Chinese Room” analogy, developed by philosopher John Searle, purports to be a more sophisticated treatment of how a system can seem intelligent without any understanding. In fact the argument is underpinned by both fallacies and a manipulative cognitive trick. It needs, in my opinion, to be thoroughly deconstructed, so let us proceed to do that next.
The “Chinese Room” thought experiment describes a room with an agent inside. Symbols are introduced; the agent looks them up in tables and transcribes them to other symbols (translating to another language, such as Chinese, which the agent does not understand), and sends them out. An experimenter interacting with the room might believe it understands Chinese, but in reality no part of the room truly comprehends Chinese.
The analogy is supposed to show how a non-intelligent entity can produce results that appear intelligent without truly understanding what it is doing. This premise is incorrect for several reasons. Moreover, it is a persuasion argument designed to manipulate the human cognitive system by invoking a false analogy. This scenario leads the listener to instantly think of a simple sequence translated through a simple lookup table. The listener then incorrectly applies that analogy to intelligently responding in a language that the agent does not know, suggesting that the ability to translate simple sequences through lookup tables equates to cogent responses in a foreign language. This is a fallacy.
Language translation is far more complex than a simple list of rules - if it were that straightforward we would have had Google Translate in the 1950s. For instance, the English word “field” has multiple meanings depending on context. Translating it requires tracking the other parts of the sentence, surrounding sentences, and the overall conversation. In the parlance of information technology, we say that effective translation is “stateful.” To adequately perform sentence translation, the Chinese Room needs a system for tracking state.
Assume, for argument's sake, we equip the Chinese Room with a large whiteboard to track state. Since managing a stateful system is much more complex than a stateless one, and the rules multiply exponentially in complexity, the books soon proliferate into an entire library, referring to various conversational contexts. However, determining the conversational context isn’t precise - it requires ongoing statistical analysis on what words are likely to mean. How can we solve this?
Now, we must add a computational device for statistical analysis to the Chinese Room. So, let's append a door leading to an annex of sub-rooms containing other agents with a plethora of abacuses and whiteboards to perform the analysis. Is this feasible? Yes, a sufficiently sophisticated stateful system with enough computation can deliver good translation. At last, we have re-created Google Translate, but we had to include state and computation, both of which were absent from the original Chinese Room analogy.
However, translation is not conversation. We don’t possess a room that can speak Chinese, we simply have a system than can translate Chinese effectively. The second persuasion slight-of-hand that Searle employs is to substitute the instance of translation - a sub-set of speech - for the significantly more complex problem of conducting a conversation. What Searle originally conceived with the Chinese Room was a system that could pass the Turing test - convincing the examiner that the system understands a topic in the way a human would. But translation is a distinct problem. When two individuals converse using a translator as an intermediary, the translator must have a certain level of intelligence to translate the conversation effectively, but the majority of the higher-level understanding lies within the two conversational parties, not in the translator. Searle replaced a system that can pass the Turing test, i.e. maintain its own end of a meaningful conversation, with one that can handle the simpler problem of translating the conversation. Responding intelligently in a foreign language requires much more than even our “Chinese Room Plus,” augmented with state storage and computation, can provide.
The most immediate requirement is randomness. The Chinese Room would fail the Turing test instantly as soon as the examiner asked the same question twice. Lacking a source of randomness in the room, it will invariably reply with the exact same response every time. To pass a Turing test, the room must better mimic conversation. This requires that the lookup tables now contain statistical probabilities for each word, and the room needs a source of randomness - perhaps a set of dice.
More broadly, it’s clear that an intelligent entity needs randomness to function effectively because some paths lead to local minima, and the only way to escape one is to select a different path. Repeating the same sentence given the same input is a specific case of this problem, but in general, cognitive tasks like problem solving need randomness to function on a larger scale.