Why LLMs are so effective? Because language is though
Dragana and I experienced telepathy. Twice.
It was not a fluke. It was not our imagination. We experienced cold hard telepathy. This happened on two occasions where, for a brief period of time, we knew exactly what the other person was thinking. We became deeply connected by the exact same train of thought; the same sequence of words traversed our minds. These thoughts were triggered by situations only possible given our shared experiences: our history as friends, our immigrant upbringings, our AI related jobs, and English as our common language. The first time, we simply laughed it off. The second time, we realized that this could have only happened given our shared understanding of reality, powered by a level of specificity that only language can convey. This realization is at the heart of the current AI revolution and it is what makes Large Language Models (LLMs) work so scarily well.
Part 1 - Language as a substrate of thought
Languages carry a bottomless level of meaning. From a large but still a finite number of words, we can create an infinite number of meaningful and extremely specific ideas that we can transmit to others. An illustration of this is how, when a relationship dies, a dialect dies with it. The words and phrases each couple uses carry such profound meaning that it becomes a part of their shared personality.Douglas Hofstadter, a great inspiration for this post, calls this phenomenon a low-resolution copy of each other’s mind in his book, “I am a Strange Loop”. Chapter. 17 - “How We Live in Each Other” An example: Dragana’s native language is Serbian and she is married to R, whose mother tongue is Dutch. Their relationship, from its start, has been built in English which is their second language. And even when Dragana becomes fluent in Dutch (one day many years into the future), they will still speak English to each other because they have developed their love-layer on top of the English language, which is specific and unique to only the two of them. The English language is not only a means for communication but is also part of the relationship itself! The love-layer example shows a foundational principle around languages, which can then be extended to siblings, friends, family units, communities, cultures, ethnic groups and beyond. The infinite layers of specificity and meaning are created by shared semantic composition among different groups of people. This is also why the Scots have over 421 words to represent concepts around Snow, and why accurately translating between languages is extremely hard. Take the Spanish-adopted word Apapachar, it can be roughly translated to hug with the soul, but this translation is completely devoid of the beauty and impact of what an apapcho feels like. Certainly direct human-to-human communication is way more than just the words in a language. There’s the tone, both written and spoken, the body movements, the emojis you use and even subtle facial expressions that our brains are so primed to detect. This is precisely why language alone can only get you so far, it’s just one of the many layers that make up communication.
Our first telepathic encounter involves our shared friend Y. We were having drinks at his place when he pulled out some fancy French Vanilla Ice Cream. Paolo said “Wow man, French Vanilla.”, however, both thought the exact same thing: Y is very bougie. And only I (Dragana) understood what this conveyed. Both a true appreciation of a friend sharing high quality ice-cream with us and a tiny hint of acknowledgement that our friend demands and enjoys certain standards. I looked at Paolo and we both started laughing frantically. We were not only thinking the exact same thing but we knew that the other person was thinking it. That was the telepathic moment. We drew from the same database of our experiences, inside jokes and mutual observations. Paolo’s “Wow man, French Vanilla.”, triggered the same shared meaning. We had built up enough shared semantic composition that minimal language conveyed maximal meaning.
The examples above show why human language is such an extraordinary cognitive tool that, in a way, it is our thought. We can probably think ideas and feel feelings but those materialize when we can describe them using words. Language formalizes our thinking process. Take your own emotions as an example and how damn hard it is to precisely pin them down just by using words. Clearly, we are not even close to being the first ones to discover this phenomena. Similarly, it’s been proven that there’s more to what thought is than just the languages we use. However, there’s increasing evidence on how language does shape thought. Furthermore, we also needed a clickbaity title.
Part 2 - Why LLMs work so well?
Now, let’s completely switch context from psychology and linguistics to the world of Mathematics and Artificial Intelligence. For centuries, mathematicians have been trying to represent and abstract reality with symbols, equations and diagrams. From geometry to calculus, Math has been able to accurately describe phenomena and it’s all worked quite well! At times it seems that math is baked into reality. As an abstraction, math is quite useful, underpinning most if not all, of humanity’s technology. From the wheel to the transistor, there’s some mathy abstraction that can describe its workings. When reality became too complex to describe exactly, mathematicians invented probability and statistics. These disciplines allowed us to quantify uncertainty and capture general trends in data, synthesizing it into, unfortunately for our day-to-day jobs, actionable insights. Fast forward to today in our data rich world. Can we also find trends in the language data? Well, it turns out that if you add a couple of teraflops of compute power and stack layers and layers of linear regressions, boom, magic happens. Suddenly, your computer starts speaking back to you.
The connection between language and AI sits at the core of Large Language Models (LLMs). At the hearts of ChatGPT, Claude, Gemini and all the recent vaguely-Englishman named AI-products, there is a Large Language Model. An LLM as its name suggests is a model, a representation of reality, that takes language (a piece of text) as its input and tries to autocomplete it, i.e. it also outputs a piece of text. To achieve this feat, LLMs first break down the input text into tokens, think: a word, a piece of a word or even a punctuation mark. As an example, the word fishing is very different from fish and the -ing suffix by themselves, the latter, fish and -ing are the tokens. Next, these tokens are converted into long lists of numbers called vectors. These lists of numbers are not random at all, their relative positions between each other is what makes it retain the semantics and context. This means that tokens with relatively similar meanings have vectors that are close together. This is what we nerds call embeddings. You should click here for a cool visualization of it. Then, these embedded vectors are passed through a series of mathematical operations called a deep learning network. You can think of this network as a machine and its parameters as finely tuned knobs, that are elegantly combined to process text in a way that it seems to understand it. So, if we input an incomplete sentence: “I like fishing in the ____”, the LLM will try to complete it. A few likely next words could be: river, sea, lake or pond, however, the word rock (?) would be unlikely, therefore meaningless. Last, the LLM draws one of the likely words at random, filling in the blank in a way that’s statistically coherent. This randomness is paradoxically what allows LLMs to sound somewhat human and generate interesting texts.
And that is how LLMs work; very very, VERY roughly. However, the why they work is way more subtle. It turns out that the fundamental principles that make language so profoundly rich can be found within all of humanity’s written work. It’s only natural because language is part of the cognitive toolkit that has allowed humanity to build a shared understanding of reality. The inflections that make Italian sound so significant. The combinations of nouns and adjectives that writers wield to make a text provoke emotions in you. And yes, even some of the cultural context that the word apapachar conveys. In the process of training the LLM, we fine-tune these knobs to make the output make sense given an input. When we use all of humanities’ written work as input-output combinations, the fundamental principles of language are baked into the machinery of the LLM. The same intricacies and depth that make language the substrate for your thoughts is then approximated scarily well by the LLMs. While we don’t know exactly how it works, we know that it’s an emerging property given the training process. Even things like tone or style, both paradoxically outrageously hard to describe in words, are modeled by LLMs. This means that yes, unfortunately, if you were to fine-tune an LLM with all of your writing and instant messaging conversations, the LLM would totally be able to replicate your style. It’s the same principle of why you can easily detect if your wife is angry at you from a one-line SMS text. Another interesting example is how people are starting to develop parasocial relationships with AIs chatbot, sometimes with dire consequences.
The second time we experienced telepathy, Paolo and I were having drinks, again, but this time at my place. We were discussing how R and I have English as our love-layer, our first example. Similarly, Paolo was sharing how he and his friends were able to speak a version of Spanish that was specific to the few blocks of Mexico City where he grew up in. There was not a single phrase that triggered us but similar to our first telepathic encounter, we reached exactly the same conclusion: “language is crazy and it has so much depth!” Which then devolved into, “Wow, language IS thought”, and eventually morphed into, “… and that’s why LLMs work so well!” Call it what you want, either the ramblings of two friends, or simply, a narrative vehicle to hook you into reading this essay until the end. However, it is true. We both thought about it and we both knew that the other person was thinking it. We might have even produced an unintelligible whiteboard, which roughly served as the basis for our essay. For a second time, our shared understanding of reality, the content we were discussing and, unfortunately, our jobs as PM working with LLMs, led us to exactly the same train of thought. This was the second telepathic moment and we really wish that everyone can one day experience this level of semantic connection.
We allowed our thoughts to wander for a bit. Despite not having whiteboards to prove it, our minds were blown a few more times that night. What if language is just one of the many substrates that can describe some aspect of reality? As we already saw, math can do it quite well. But what about code? What about sounds, music, light, images and colors? Each substrate is a kind of representation which describes, at different levels, a different aspect of reality. The vibes are totally real. However, we will leave these exploding thoughts for some other time. For now, we can just hope that this essay has been able to telepathically transmit to you, dear reader, our core idea: that LLMs work because they have been able to model language, one of the core components of our thoughts, in all of its richness and depth.
—--- Seal of Organic, AI-free human thinking. We vouch that we did not use AI to write the content of this essay. As tools, we did use LLMs to do research, circle ideas and perhaps find the perfect word for a sentence. Ultimately, that’s what LLMs are great at! However, we believe that our thoughts and ideas are what make us humans, and offboarding all the thinking process to an LLM completely devoid of it. Plus, it’s you, the human writer / reader that gives it meaning vs. the 5 page essay that ChatGPT could have written with a simple prompt.
Who are we and what are our credentials to dare us write this? Dragana is a Technical Product Manager at Amazon working on the Kindle Direct Publishing Team. She is currently building a 0-1 LLM-powered product. She is also a bit of a polyglot, speaking 4 languages, a crystal clear thinker, and six-pager menace writer.
Paolo sits at the intersection of math & business. He once described himself as a datamancer, an AI-maximalist and an electronic music enthusiast. He is currently a GenAI Subject Matter Expert at Amazon, where he is empowering a very large org. with AI tools. He’s also surprisingly persistent at getting us to finish this essay.