by Jobst Landgrebe, Cognotekt GmbH
The first part of this series, “Meta’s Galactica and other Epic Fails” explained why NLP Large Language Models (LLM) regularly fail and how proponents of AI innovation often overlook the technology’s inherent limits. Let’s now look at what can be done with NLP and LLM.
Shortly after we published “Epic Fails,” the model chatGPT was released, creating a huge wave of excitement and anticipation. Has recent uptake of chatGPT and its results rendered my last article obsolete? No, but there’s a finer point here. While some AI, such as Galactica, have failed miserably, those that do not “fail” vary in terms of their efficacy and potential to create value. I’ll examine several LLM use cases and share how to spot the most viable applications.
First, let’s quickly recap what language models do. They basically model language as sequences of symbols so that it becomes possible for machines to solve certain types of tasks. For example, in so-called neural machine translation, input symbols in one language are used to trigger a series of output symbols in another language. The relationship between the two is learned from the training data, resulting in a plausible translation. Similarly, texts can be generated using foundational LLM. GPT-3 is an example of symbol-and-sequence-assembling AI that can compose texts in certain repetitive situations but that does not actually understand anything.
Which use cases are viable?
Internet search is one of the oldest and most profitable use cases, enabling wide-spread consumer access to Internet content since the late 1990s. Today, it involves NLP components to identify texts beyond classical keyword search; however, no semantic search is available (nor will it become available any time soon). As a result, users need to heavily filter search results themselves to find what they’re seeking.
To partially meet this need, internet search firms such as Google are now considering a pivot that would turn information retrieval engines into query answering/knowledge retrieval machines. A recent example of this is chatGPT. The technology, however, often mixes useful output with utter nonsense (see Appendix). Currently, it is neither possible to show users the LLM’s output sources, nor to filter the output reliably to suppress such nonsense. It is unlikely that the latter will ever become possible in a reliable manner using this technology. Nevertheless, there is commercial potential in question-answering and knowledge-retrieval, as long as users are willing to filter the output themselves.
Social media screening is a newer NLP-based application used by big social media corporations to monitor use-generated content. These are classification algorithms which label texts or voice recordings rendered as text with labels such as “inconspicuous,” “problematic,” “conspiracy theory,” “fake news,” or “hate speech.” Their effectiveness is rather low as the algorithms can only look for syntactic patterns and produce huge numbers of false-negatives and false-positives. Will LLMs change this? Not very much because they can only produce an output that corresponds to the typical textual sequence between a certain content and the reactions to it found in the web. For example, scientific texts that state there are two sexes are routinely classified as hate speech because this view dominates online discussions concerning gender. Conversely, LLMs can generate actual hate speech, such as antisemitism, because of the prevalence of antisemitic rhetoric in the online text corpus used for training. LLMs may further compound such outputs by combining them with heuristics concerning content authors, such as their network relatedness to manually tagged (“bad”) users or the type of content they follow. However fallible, this type of screening is regarded as crucial by both Western and Asian governments and presents a potentially massive revenue stream for social media networks.
Advertisement and recommendation leverages NLP to deliver relevant ads or content to an individual based on language found in user-generated texts and website content aggregated throughout the individual’s online journey. This is one of the best applications of inexact syntactic NLP. Because no harm is done if the ad does not perfectly target the user, it presents only a potential upside. As such, it is the basis of a major tech industry business model which has the effect of a private taxation on all other businesses.
Automated journalism to generate short text on weather, sports, and economic trends works quite well, but its value for users is limited. It is only viable as one ingredient in a mix of human-made content and ads. The idea that human journalists could be replaced by AI is nutty.
Repetitive text generation using LLM like chat GPT in generic business domains such as: routine customer request answering; product-, management-, marketing-, and sales-related text creation; the creation of texts for HR purposes (recruiting, candidate evaluation, and rating); or the drafting of generic project management schedules. Because the output always consists of symbol sequences that reflect sequences found in source corpora, the user must always review and edit the text before using it, but a part of the sentence assemblage and typing burden can be taken off the user. The effect is not much better than using text templates, but its value can be rationalized. Note that the nature of the text itself must be utterly “soulless” and routine for the algorithm to be effective. Any language deviating from the most basic usage patterns would be unmanageable.
Text completion and correction are very useful applications for end-users, but their economic potential is limited as they are mostly seen as a must-have on mobile devices which impose costs on vendors without generating additional profits. This may be different in niche markets such as cold sales email-generation but even then, the potential is limited due to the limited quality of the asemantic output. The correction aspect can also be used as a basic debugging functionality for simple bugs in computer programming, but the machine will not make human programmers redundant. The efficiency gain that can be obtained is moderate.
Use cases like information extraction from texts, text classification, or command tools such as Alexa range from low profit to unprofitable or even loss-making. Amazon is even considering discontinuing Alexa because its impact on sales is so low that it cannot justify the expense of selling the device below production costs, and the additional user information it captures is not well suited for commercialization. Specific information extraction with dNN is unreliable and unable to replace humans. Approximative text classification works but is not viable as a service built into processes where precision and accuracy are imperative.
Good and viable NLP use cases are those which do not need an understanding of language but can work by treating language as chains of signs. When looking at a use case, consider the following:
If you have to answer any of these with “yes”, LLM-NLP is not what you want.
On Dec. 27th 2022 chatGPT produced the following nonsense-output when given the input ‘quantum science is confusing’: “Quantum science is an incredibly powerful and useful branch of science, yet it is also highly confusing to many people. This is due to the fact that Quantum science deals with phenomena which are not easily understandable and involve principles that contradict the laws of physics as we know them.”
Vor welcher Herausforderung
stehen Sie? Schreiben Sie uns.
Dr. Raija Kramer