Natural Language Processing (NLP): Meta’s Galactica and other Epic Fails

Language AI series, Part I

by Jobst Landgrebe, Cognotekt GmbH

Meta Platforms, Inc., the company that owns Facebook, recently shut down Galactica, its AI-based NLP service. Facebook’s Chief AI scientist, Yann LeCun, had previously described Galactica as a Large Language Model [LLM] trained on scientific papers. LeCun tweeted, “Type a text and will generate a paper with relevant references, formulas, and everything. Amazing work by @MetaAI.” Meta management claimed that “can store, combine, and reason about scientific knowledge” and perform tasks such as solving equations, summarizing scientific texts, or searching for relevant literature. This would mean that the AI would act like a superhuman polymath able to understand and evaluate the most intellectually demanding text that humans can produce.

But only days after its release, the well-known AI critic Douglas Hofstadter stated, “What I find is that it’s a very bizarre mixture of ideas that are solid and good with ideas that are crazy. It’s as if you took a lot of very good food and some dog excrement and blended it all up so that you can’t possibly figure out what’s good or bad.”

Quoting Hofstadter, AI expert Gary Marcus added that such AI-based language models “confabulate math and science. High school students will love it and use it to fool and intimidate (some of) their teachers. The rest of us should be terrified.” (See samples of the nonsense the model produces.) In short, Galactica failed on all its promises – as the international scientific community quickly realized – and Meta has now discontinued the service.

This is not the first language-AI that big tech companies had to shut down. Other prominent examples of failure are the chatbots “Tay” by Microsoft and Facebook’s “M.” Tay, shutdown in 2016, started to output extremist and sexistic postings on Twitter, demonstrating that AI trained on user-generated language can always be manipulated and are therefore not fit for purpose. “M,” which also failed to meet its users’ needs was stopped in 2018. In the same year, another Facebook AI labelled the Declaration of Independence, the modern legal foundation of human rights, as “hate speech” – a spectacular example of AI-misinterpretation and failure.

What is the reason for this?

Why the models fail

Such language models fail because they do not understand the complexity of human language. Instead, they model sequences of symbols without the ability to interpret their meaning. In a deep sense, they can never achieve the understanding of meaning, because meaning requires consciousness and intentions: “What does this mean for me?” is the question which we consciously and unconsciously ask ourselves when we encounter a unit of language or a symbol (e.g. a stop sign or a crucifix). But machines have neither consciousness nor intentions, they merely execute calculation patterns, albeit very complicated ones.

Whenever the system aspect which a language model is trained to emulate can be modelled as a sequence of symbols, an acceptable functionality is obtained, as in, for example, the so-called neural translation models. However, the translation quality is too poor for mission-critical usage as in legal texts or when an interpreter is used in any professional setting. Without an understanding of the semantics and pragmatics of the situation, there can be no output of sufficient quality. In all areas in which the agent needs to understand language and when utterances need to be produced, which requires intentionality, machine-based systems fail to communicate valid and reliable messages. This is why dialogue systems fail, and why machines cannot operate on texts in the way that the marketing people at Meta imagined it. Extracting meaning from text at the level of human understanding is outside the scope of such systems. The main reason for this is that human language is the output of a non-ergodic process, so that no matter what samples are used to train an AI, the resulting model will never be adequate for new language utterances (more details here in chapter 10). Only models for highly repetitive language, such as user commands for Siri or the shopping assistant Alexa built by Amazon can satisfy the highly constrained needs of their users to some extent.

It is not a matter of adequate training data or a current stage of technological capability. It is matter of immutable mathematical limits inherent to machine processes.

Why big tech does not adapt

The latest failure of the big AI-based language model Galactica seems not to deter transhumanist techno-solutionists from continuing to butt their heads against the wall. Recently, journalists at the New York Times described the text-writing abilities of the language model GPT-3 as being on par with humans. Elon Musk repeated last month that he sees conscious AI coming soon and uses this idea to promote his company Neuralink as a way to protect mankind from dystopian machines. Such notions make for exciting science fiction, but even the most basic lens of functional analysis would prove them to be flatly unrealistic. “The failure of Galactica is just an interim result,” tech managers state optimistically, “We are getting better every day.” It is clear that the tech giants prefer to bet on mathematically infeasible promises right now rather than to focus on what can be done. According to Pitchbook, VC investment in AL and ML technologies has plummeted from $118 bn in 2021 to $48.2 bn in 2022 – perhaps a sign that science-based realism is making a comeback causing investors to halt funding more towers of Babel.

Read the next newsletter to see what can be achieved with language AI and how to recognize it.


Vor welcher Herausforderung
stehen Sie? Schreiben Sie uns.

Dr. Raija Kramer
+49 221-643065-10