On 9 February, a clearly panicked Google launched its response to ChatGPT. Part of the launch was a video on Twitter which boasted of some of its new chatbot’s capabilities. One of the questions it asked the chatbot (as evidence of its prowess) was about the James Webb Space Telescope and what discoveries it had made.
The chatbot (called Bard) flubbed the answer, and the blooper was only pointed out later by professional astronomer Bruce Macintosh.
A few things then happened. The first was a lot of red faces, given that a simple old-style Google search would have immediately revealed the correct answer. Then there were howls of derision from various AI-sceptical sectors of the public gallery. But this wasn’t nearly as bad as Google’s share price, which lost $120-billion in value in eight hours. You read that figure right. There are clearly big stakes at play here.
Which reveals AI’s dirty little secret.
ChatGPT and the many competitors biting at its heels (Bard and others) all rest on a reasonably new piece of statistical alchemy called machine learning, now the core component of artificial intelligence (AI). Machine learning is what it says — a mathematical magic trick that allows an AI to learn from “experiences” fed to it.
There are several ways in which it can do this, but here’s the dirty secret: machine learning cannot easily explain how it gets to its conclusions and neither can it verify its conclusions are true. It sometimes gives fictional and even seemingly hallucinatory answers. As evidence: a friend asked ChatGPT who Steven Boykey Sidley is. It proclaimed confidently that I have written a bunch of anti-apartheid books and am a senior lecturer in creative writing at UCT, neither of which is even vaguely true.
A system which can’t explain its conclusions and cannot guarantee their veracity and sometimes hallucinates? That’s a pretty large Achilles heel for anything acting in the service of knowledge.
How come? The answer lies quite deep in the statistical and computational weeds of machine learning, beyond my pay grade. But it is basically this — these sorts of systems train themselves on many millions or billions of blobs of text or images or music. They then draw relationships between the blobs, using statistical inference and whatnot and assign weights to those relationships. And within ChatGPT’s bowels lives 300 billion words.
It takes enormous computing power to do this process on a large dataset. Apparently, OpenAI (ChatGPT’s creator) spent hundreds of millions of dollars over many months just on the single-shot computation that assigned the weights. Asking it “why” it said something later would require more computing power than anyone would be willing to spend.
The answers to those “whys” lie in the mists of computing cycle history.
So where does this leave investors and curious onlookers now watching an expected $500-billion migration of value from Google to Microsoft? Not to mention the growing pile of pent-up venture capital champing at the bit behind the starting gates looking for the next big thing. It means that there is big trouble ahead, as expectations fall far short of reality, as is inevitable with the sudden arrival of new fancy disruptive technology.
Far from ready
It also means that the technology is far from ready for prime time for applications that require perfect recall, perfect accreditation, perfect provenance and perfect falsifiability. At least, not without a human hand on the tiller. Which is fine — a world of infinite possibilities awaits as this tool is learnt and mastered by committed human pilots, who check and recheck and keep the tool from stepping outside of its lane.
And, I suppose, a world of dystopia awaits as bad people get hold of this stuff, those who will take advantage of its inability to check itself. Gordon Crovitz, the co-chief of the misinformation-tracking NewsGuard, said this:
“This tool is going to be the most powerful tool for spreading misinformation that has ever been on the internet. Crafting a new false narrative can now be done at dramatic scale, and much more frequently — it’s like having A.I. agents contributing to disinformation.”
But, on the other hand, there are a great many other more benign and useful applications that need not submit to the high barrier of perfection: graphics, background research, recipe suggestions, basic coding, conversational language interaction, transcription and even art, as I have previously argued here.
Visit Daily Maverick’s home page for more news, analysis and investigations
AI researchers are aware of this glaring weakness — lack of provenance and proof of veracity. There is a branch of machine learning called Reinforcement Learning with Human Feedback (RLHF), within which live ChatGPT, Bard and others, like Claude from Anthropic, another heavily funded newcomer. The human feedback component is a little grey right now — how often, how deep, by whom and with what guidelines?
And so smart minds are all trying to ferret out the correct balance of machine and human until new algorithms arrive that can cut the tether between the wetware (us) and the dryware (machines), or at least mitigate their interdependency.
Until then, there are going to be peals of laughter and short sales of stock as various hotly anticipated AIs occasionally have an attack of the stupids.
But just as surely, this dirty little secret will be aired, sniffed, taken apart, re-engineered and redeployed and properly fixed. And only then we will be at the dawn of the new era so many have proclaimed is already here. DM
Steven Boykey Sidley is a professor of practice at JBS, University of Johannesburg.