2023 wasn’t a great year for AI detectors. Leaders like GPTZero soared in popularity but faced backlash as false positives led to false accusations. OpenAI then quietly put a damper on the idea by creating an FAQ answering whether the AI detector would work. verdict? “No, that’s not our experience.”
OpenAI’s conclusion was correct at the time. But the demise of AI detectors has been greatly exaggerated. Researchers are inventing new detectors that perform better than their predecessors and can operate at scale. And these are accompanied by “data poisoning” attacks that an individual can use to protect her job against her wishes to train her AI model.
“Language model detection can be performed with sufficient precision to be useful, and it can also be performed in a ‘zero-shot’ sense. This means that all kinds of different language models can be detected simultaneously. ” he says Tom. Mr. Goldstein is a professor of computer science at the University of Maryland. “This is the real counterpoint to the idea that language model detection is basically impossible.”
Use AI to discover AI
Goldstein is one of the co-authors of a paper recently uploaded to the arXiv preprint server that describes a “binoculars,” a detector that combines an AI detective with a helpful sidekick.
Early AI detectors played detective by asking a simple question: “How surprising is this text?” The assumption was that text that is statistically less surprising is more likely to be generated by AI. The mission of LLM is to predict the “correct” word at each point in a text string. This provides a pattern that the detector can detect. Most detectors answered by giving the user a numerical probability that the text sent was generated by her AI.
But that approach is flawed. AI-generated text can be surprising if the detector has no way to estimate it, even if it’s generated in response to a surprising prompt. The reverse is also true. Humans may write unsurprising sentences when they address well-worn topics.
Detectors will only prove useful to businesses, governments, and educational institutions if they create fewer problems than they solve, and false positives cause more problems.
Binoculars asks an AI detector (in this case Falcon-7B, an open source large-scale language model) the same questions as previous detectors, but also asks its AI sidekick to do the same tasks. . Compare the results to calculate how much your buddy surprised the detective and create a benchmark for comparison. Text written by a human should prove more surprising to a detective than an AI sidekick.
There are gaps in what you can see with binoculars. Vinu Sankar Sadasivan, a graduate student in the University of Maryland’s Department of Computer Science and co-author of another preprint paper evaluating various LLM detection techniques, says binoculars “significantly improve zero-shot detection performance, but , it is not better than watermarking or search-based methods in terms of accuracy.” Binoculars is also still under peer review. Her co-author on the Binoculars paper, Avi Schwarzschild, says her goal is to present at a major AI conference.
But Goldstein insists accuracy isn’t the binoculars’ secret sauce. He believes its real benefit lies in reducing false positives and revealing his AI text.
“People tend to focus on precision, which is a mistake,” Goldstein says. “If the detector incorrectly determines that text written by a human was written by a language model, […] It can lead to false accusations. But if you make a mistake and claim your AI text is human, that’s not so bad. ”
That may feel counterintuitive. AI can generate text at an incredible scale, so much text generated by AI can slip through even a near-perfect detector.
But a detector will prove useful to businesses, governments, and educational institutions when it creates fewer problems than it solves, and when false positives cause more headaches. Limited. Goldstein points out that even with his single-digit false positive rate, the scale of modern social networks would result in tens of thousands of false accusations each day, undermining confidence in the detector itself. doing.
Deepfakes fool people, but not detectors
Of course, AI-generated text is just one front in this battle. Images generated by AI are a concern as well, with recent research proving that they can deceive humans.
A preprint paper by researchers at Germany’s Ruhr University Bochum and the Helmholtz Center for Information Security found that AI cannot generate images of human faces and reliably separate them from real photographs. Researchers from Indiana University and Northeastern University in Boston estimate that 8,500 and 17,500 active accounts on social media platform X (formerly Twitter) use AI-generated profile photos each day.
becomes terrible. These findings focus on generative adversarial networks (GANs). GANs are an older class of AI image models that are better understood and known to cause discernible image artifacts. But state-of-the-art image generation tools currently sweeping the Internet, such as AI Stable Diffusion, instead use a diffusion probability model to inject random noise into images that are likely to mimic what the user wants. Learn to convert. The diffusion model performs significantly better than his previous GAN model.
But still, the spread turned out to be less insidious than feared. A preprint paper to be presented at the International Conference on Theory and Applications of Computer Vision (VISAPP 2024), to be held in Rome from February 27 to 29, shows that with a few adjustments, a GAN-trained detector can spread It turns out that the model can also be detected.
“We found that the GAN-trained detectors we already had could not detect diffusion,” says study co-author Jonas Ricker, a graduate student at Ruhr-University Bochum. “But that doesn’t mean it can’t be detected. If we update these detectors, we can still detect it. [images generated by diffusion models]” This suggests that there is no need to reinvent early AI image detectors from scratch to detect modern models.
Although the accuracy is reduced when compared to GAN detection and can reach 100 percent in some cases, the detection accuracy of diffusion models is still often above 90 percent (results may vary depending on the diffusion model and detector used). Masu). Detectors that are fine-tuned to detect diffusion models can also detect images produced by GANs, making them useful in many situations.
But what exactly does the updated detector detect? As is often the case with modern AI models, the answer is a bit of a mystery. “It’s not very clear which artifacts are detectable,” says Simon Damm, another graduate student at Ruhr-University Bochum and one of the co-authors of the VISAPP 2024 paper. “Performance shows they are reliably detectable.” […] But interpretability is not there. ”
Data poison degrades spam
Although modern AI detectors are promising, detection is an inherently defensive approach. Some researchers are investigating preventative tactics such as data poisoning to confuse his AI models during training.
Perhaps the most powerful example is Nightshade, a technique invented by researchers at the University of Chicago. Nightshade is a prompt-specific data poisoning attack built to degrade diffusion models. It creates images by making subtle changes to pixels, most of which are invisible to the human eye. However, an AI model trained on these images will learn incorrect associations. They might learn that a car looks like a cow or that a hat looks like a toaster.
“The easiest way to describe it is a little poison pill that you can put into your art,” said Ben Y. Zhao, a computer science professor at the University of Chicago and one of the researchers who developed Nightshade. he says. “If it is downloaded against your will. [and used for training an AI model]This may have a negative effect on the model. ”
Importantly, Nightshade can degrade the model even if some of the training data is contaminated. Newer models, such as Stable Difffusion XL, can attack a specific concept (such as “dog” or “cat”) with as few as 100 toxic samples. The effects are compounded by multiple attacks, affecting related concepts and degrading various models.
“The easiest way to explain this is to put a little poison pill in your art.” —Ben Y. Zhao, University of Chicago
It is still unclear how this will be reflected in the real world. The researchers only had one of his Nvidia A100 GPUs for training. This is far fewer than the dozens or hundreds of machines used to train modern models. As a result, we were limited to training on smaller datasets than those used by OpenAI and Stability AI. Nightshade is still under peer review, but the development team plans to present it at a conference in May. “There are still unknown factors that are difficult to control,” Zhao said. “But if you look at it at a higher level, you should see the effect.”
Zhao hopes that Nightshade, which is free for anyone to download and use, will be a more effective tool than opt-out schemes or no-crawl requests. Compliance with such requirements is voluntary, and although many large AI companies and organizations have committed to honoring them, they are not legally required and difficult to verify. Data poisoning does not require cooperation, but instead provides protection against degrading image models containing poisoned data.
“We’re not trying to break the model out of malice,” Zhao says. “We’re trying to give content owners the tools to stop fraudulent scraping for AI training. This is a way to push back, and it’s something that really incentivizes model trainers.”
From an article on your site
Related articles on the web