A few months ago, while meeting with an A.I. executive in San Francisco, I spotted a strange sticker on his laptop. The sticker depicted a cartoon of a menacing, octopus-like creature with many eyes and a yellow smiley-face attached to one of its tentacles. I asked what it was.
“Oh, that’s the Shoggoth,” he explained. “It’s the most important meme in A.I.”
The executive explained that the Shoggoth had become a popular reference among workers in artificial intelligence, as a vivid visual metaphor for how a large language model (the type of A.I. system that powers ChatGPT and other chatbots) actually works.
…
In a nutshell, the joke was that in order to prevent A.I. language models from behaving in scary and dangerous ways, A.I. companies have had to train them to act polite and harmless. One popular way to do this is called “reinforcement learning from human feedback,” or R.L.H.F., a process that involves asking humans to score chatbot responses, and feeding those scores back into the A.I. model.
Most A.I. researchers agree that models trained using R.L.H.F. are better behaved than models without it. But some argue that fine-tuning a language model this way doesn’t actually make the underlying model less weird and inscrutable. In their view, it’s just a flimsy, friendly mask that obscures the mysterious beast underneath.