AI Briefing: Researchers warn that watermarking AI content is not enough

In a new report from Mozilla, researchers suggest that common methods for disclosing and detecting AI content are not effective enough to prevent risks associated with AI-generated misinformation. In an analysis published today, researchers note that current guardrails used by many AI content providers and social media platforms are not strong enough to combat malicious attackers. In addition to “human-friendly” techniques such as labeling AI content with visual or audible warnings, the researchers are using machine-readable watermarking techniques such as using encryption, embedding metadata, and adding statistical patterns. was analyzed.

There are inherent risks associated with augmented AI-generated content that intersect with the Internet’s current distribution dynamics. Mozilla says that focusing on technical solutions can distract from solving broader systemic problems, such as targeted political advertising, and that self-disclosure alone is not enough. Pointed out.

“Social media, a critical infrastructure for content distribution, accelerates and amplifies that impact,” the report’s authors wrote. “Furthermore, the well-documented problem with social media platforms encouraging emotional and inflammatory content through algorithms could lead to a ‘doubling down’ effect as the distribution of synthetic content is prioritized.”

According to Mozilla, the best approach is to combine increasing media literacy and passing regulations while adding technological solutions and increasing transparency. Mozilla also brought up the European Union’s Digital Services Act (DSA), which it described as a “pragmatic approach” that requires platforms to take measures without prescribing specific solutions to follow. .

Instead of relying on watermarks, some companies are building their own tools to detect deepfakes and other misinformation generated by AI. Pindrop, an AI audio security provider, has developed a new tool to detect AI audio based on patterns found in calls. The tool, released last week, was created using a dataset of 20 million AI audio clips analyzed through over 100 different text-to-speech tools. Pindrop is known for detecting a recent deepfake of Joe Biden’s robocalls that resembled President Biden’s, and also discovered an audio deepfake of Anthony Bourdain in the 2021 documentary “Roadrunner.”

Pindrop co-founder and CEO Vijay Balasubramaniyan said the technology looks for audio anomalies to distinguish between live and recorded calls. For example, Pindrop looks for spatial features found in human language, such as fricatives, which cannot be easily reproduced in recordings. It also detects temporal anomalies, such as the shape of the mouth-pronounced sounds like “hello” or the speed of words like “ball.”

Although Pindrop claims to be highly accurate, Balasubramanyan knows it’s a “cat and mouse game” and the key is that malicious actors will move on to newer and better tools. He added that the key is how quickly a company can respond when a situation arises. He also pointed out that transparency and explainability of AI tools is also important.

“While a deepfake system sounds like a big monolith, it is made up of many individual engines,” Balasubramanyan told Digiday. “And each of these engines leaves a distinct signature…We need to be able to dissect the deepfake engine into very small pieces and see what signature each of those small pieces leaves behind. So even if some parts change, other parts remain the same.”

IPO proposed, Reddit bets on LLM

When Reddit filed paperwork for its proposed IPO last week, it also revealed some of its plans to use and profit from its extensive language model.

Reddit said in a company filing with U.S. regulators that its vast user content helps train language models at scale. Reddit had “more than 1 billion posts and 16 billion comments” at the end of 2023, according to the filing. That data may be used to train internal models, but also as part of data licensing agreements with other companies.

According to Reddit’s S-1, “Reddit data is a fundamental part of how current AI technology and many LLMs are built.” “We believe that Reddit’s vast corpus of conversation data and knowledge will continue to play a role in training and improving LLM. As our content is updated and grown daily, our models will We look forward to updating our training to reflect new ideas and use Reddit data.”

Last week, Reddit also announced a new deal with Google to use Reddit content to train AI models. Terms of the deal were not disclosed, but Reuters reported that Google would pay $60 million for access to the content. Reddit’s S1 also noted that the company expects revenue from data licenses to be “at least $66.4 million” in 2024.

Chatbots such as ChatGPT, Gemini, and Anthropic could also compete with Reddit’s main platform, the company said, noting that “Reddit users are trained using Reddit data, in some cases instead of accessing Reddit directly. They may choose to use the LLM to find information.”

Reddit’s IPO filing also includes a lot of information about the company’s advertising business, which makes up the bulk of its revenue. His 2023 earnings totaled $804 million, a 21% increase compared to his $667 million in 2022.

Prompts and products: More AI news and announcements

Google announced that it will introduce the Gemini model to P-MAX. In addition to image generation upgrades, Gemini now allows advertisers to generate longer headlines and site links. Another upcoming feature will allow advertisers to generate lifestyle images through her P-MAX along with large-scale campaign variations. Google also came under fire last week after its Gemini image generator created images showing people of color in Nazi-era uniforms. (Google paused the tool and promised to “improve” it in a blog post about the issue.)
Newsguard researchers said they discovered more than 700 websites masquerading as AI-generated “news.” According to Newsguard, these websites publish misinformation and other harmful content in 15 languages. Many also benefit from programmatic advertising.
Privacy-focused browser Brave has added more ways to its AI assistant Leo to help users read PDFs, analyze Google Drive files, transcribe YouTube videos, and more. This update comes just one day after Adobe added a new AI assistant to his Acrobat and Reader that helps generate summaries, analyze documents, and find answers.
Pfizer spoke to Digiday about how it developed a new generative AI marketing platform called “Charlie,” developed in partnership with Publicis Groupe and named after the pharmaceutical giant’s founder.
A group of ad tech pioneers has launched a new AI startup for publishers, according to VentureBeat.

Input/Output: Question of the Week

Results: As technology companies build their own large-scale language models and a variety of related AI tools, open source models are also playing a key role beyond closed platforms. Following Meta’s debut of his open source Llama 2 in 2023, Google debuted its own open source model named Gemma last week.

Input: If you use open source models for use in marketing, media, or commerce, please let us know by emailing marty@digiday.com.

Source link

Subscribe to Updates

What's Hot

AI Briefing: Researchers warn that watermarking AI content is not enough

IPO proposed, Reddit bets on LLM

Prompts and products: More AI news and announcements

Input/Output: Question of the Week

Related Posts