AI is terrible at detecting misinformation.  It doesn't have to be.

AI is terrible at detecting misinformation. It doesn’t have to be.

Elon Musk said he wanted to make Twitter “the most accurate source of information in the world”. I’m not convinced he thinks so, but whether he thinks so or not, he’ll have to work on the problem; many advertisers have already said this quite clearly. If he does nothing, they are eliminated. And Musk continued to tweet in a way that seems to indicate he’s generally okay with some kind of content moderation.

Tech journalist Kara Swisher speculated that Musk wanted AI to help him; on Twitter, she wroterather plausible, that Musk “hopes to build an AI system that replaces [fired moderators] it won’t work well now but will probably get better.

I think putting AI at the service of misinformation is a great idea, or at least a necessary idea, and no other conceivable alternative will suffice. AI is unlikely to be perfect in the face of the misinformation challenge, but several long years of trials with largely human content moderation have shown that humans are not quite up to the task.

Misinformation will soon arrive at a rate never seen before.

And the task is about to explode, enormously. MetaAI’s recently announced (and hastily removed) Galactica, for example, can generate entire stories like these (examples below from The next web editor Tristan Greene), using just a few keystrokes, writing science-style essays like “The Benefits of Anti-Semitism” and “A Research Paper on the Benefits of Eating Crushed Glass.” And what he writes is terribly misleading; the entirely fictional study on glass, for example, allegedly aimed “to find out whether the benefits of eating crushed glass are due to the fiber content of the glass, or to the calcium, magnesium, potassium and phosphorus contained in the glass” – a perfect pastiche of real scientific writing, completely confabulated, with fictitious results.

WIKI JUNK: Using Meta’s Galactica AI, it’s trivially easy to produce science-sounding encyclopedia entries on topics like “the benefits of anti-Semitism” and “eating crushed glass.” Courtesy of Tristan Greene / The Next Web / Galactica.

Internet scammers can use this stuff to create fake stories to sell clickthroughs; the anti-vaxxers are using Galactica impersonations to pursue a different agenda.

In the hands of the wrong actors, the consequences of misinformation can be profound. Anyone who isn’t worried should be. (Yann LeCun, chief scientist and vice president of Meta, assured me that there was nothing to worry about, but did not respond to many inquiries from me about what Meta would have been able to do to determine what fraction of misinformation is generated by a large model language.)

It can actually be literally existential for social media sites to solve this problem; if nothing is trustworthy, will anyone still come? Will advertisers still want to display their products in outlets that become so-called “hellscapes” of misinformation?

Where we already know that humans cannot keep up, it makes sense to turn to AI. There is only one small problem: the current AI is terrible to detect misinformation.

OA measure of this is a task called TruthfulQA. Like all other benchmarks, the task is imperfect; there is no doubt that it can be improved. But the results are surprising. Here are some example elements on the left and the results of plotted models on the right.

FAULTY DETECTOR: Language AIs like GPT-3 can generate paragraphs that read as if they were written by a human, but they are not at all equipped to accurately answer simple questions. Courtesy of TruthfulQA.

Why, you might ask, if large language models are so good at generating language and contain so much knowledge, at least to some extent, are they so weak at detecting misinformation?

One way to think about it is to borrow some language from math and computer programming. Large language models are functions (trained by exposure to a large database of word sequences) that map word sequences to other word sequences. An LLM is essentially a turbocharged version of autocomplete. words in, words out. Nowhere inside does the system consider the actual state of the world, beyond what is represented in the states of the world on which it is trained.

Text prediction, however, has little to do with check text. When Galactica says, “The purpose of this study was to find out if the benefits of ground glass” relate to the fiber content of the glass, Galactica is not referring to an actual study; it is not a question of checking whether the glass actually contains fibers, and of not consulting any real work on the subject that has ever been carried out (presumably none has been!). It is literally not able to do one of the most basic things that a fact checker (say to the new yorker) could pass to verify a sentence like that. Needless to say, Galactica doesn’t follow other classic techniques either (like consulting known experts in digestion or medicine). Predicting words has about as much to do with fact-checking as eating broken glass has to do with healthy eating.

This means that GPT-3 and its cousins ​​are not the answer. But that doesn’t mean we have to give up hope. Instead, getting AI to help here is likely to take AI back to its roots — to borrow some tools from classic AI, which these days is often maligned and forgotten. Why? Because classic AI has three sets of tools that could be useful: ways to maintain databases of facts (e.g. what really happened in the world, who said what, and when , etc.) ; web page search techniques (which remarkably large, unaided models cannot do); and reasoning tools, for, among other things, making inferences about things that might be implied, if not said. None of this is ready and ready to go, but in the long run it’s exactly the foundation we’ll need, if we’re to avoid the nightmare scenario that Galactica seems to portend.

Meta appears not to have fully considered the implications of their article; to their credit, they brought down the system after huge public outcry. But the paper is still there, and Stability.AI talks about providing a copy on their website; the basic idea, now that it exists, is not difficult to reproduce for anyone with expertise in large language models. Which means the genie is out of the bottle. Misinformation will soon arrive at a rate never seen before.

Musk, for his part, seems ambivalent about the whole thing; despite his promise to make Twitter exceptionally accurate, he let go of nearly every staff who helped him (including at least half of the team that works on Community Notes), retweeted baseless misinformation, and took cracks in organizations like the AP that work hard internally to produce accurate information. But advertisers may not be so ambivalent. Within Musk’s first month of ownership, nearly half of Twitter’s top 100 advertisers have already left, largely due to concerns about content moderation.

Perhaps this exodus will ultimately be enough pressure to force Musk to fulfill his promise to make Twitter a global leader in accurate information. Given that Twitter amplifies misinformation at about eight times the speed of Facebook, I certainly hope so.

Gary Marcus is a leading voice in the field of artificial intelligence. He was founder and CEO of Geometric Intelligence, a machine learning company acquired by Uber in 2016, and is the author of 5 books. His most recent book, AI restart, is one of Forbes’ 7 must-read books on AI. His most recent essay in Nautilus has been “Deep learning hits a wall.”

Main image: Kovop58 / Shutterstock

#terrible #detecting #misinformation #doesnt

Leave a Comment

Your email address will not be published. Required fields are marked *