Can journalism survive artificial intelligence (AI)? The answer depends on whether journalism can adapt its business model to the AI era. When policymakers intervene to correct market imbalances, they must enforce intellectual property rights and ensure journalism has a fighting chance in the age of generative AI.
Over the past nearly two decades, as technology companies like Apple, Amazon, Google, Meta, and Microsoft have grown to become the world’s most valuable companies, the United States has lost one-third of its newspapers and two-thirds of its newspapers. Ta. journalists. It cannot be replaced by AI.
The U.S. journalism industry cut 2,700 jobs last year alone, and an average of 2.5 newspapers closed each week. Even though traffic to the top 46 news sites in the past 10 years has increased by 43%, revenue has decreased by 56%. The dominance of digital advertising, publishing, audiences, data, cloud, and search by a small number of privately owned technology companies based in Silicon Valley has decimated the business model of journalism around the world. And now AI is doing it again.
But unlike journalists, AI appears in court, interviews defendants behind bars, meets with the grieving parents of recent school shooting victims, and fosters whistleblower trust. , unable to bravely face the front lines of recent wars. Moreover, without access to high-quality human-generated content that depicts reality relatively accurately and that journalism provides, the underlying models that power machine learning and all types of generative AI applications will malfunction and degrade. , and in some cases it will collapse. , the entire system is at risk.
Rapid advances in artificial intelligence are becoming new tools for a small number of powerful technology companies to expand and entrench their already dominant market positions. This makes it difficult, if not impossible, for sectors like journalism and the creative industries to maintain their independence, much less remain as public interest-oriented as the news industry.
The ongoing AI revolution will expand the “platformization” of journalism and expand the power held by a small number of technology companies over information channels and public spaces. This, in turn, already undermines real journalism while these companies exploit the labor of millions of journalists and others to build models and develop applications that will change our economy and society. It will only exacerbate the way they have been threatening and underselling them.
Journalism is especially valuable for generative AI searches that provide real-time information, context, fact-checking, and human language. This is an area where journalism, including local journalism, is particularly valuable and therefore must be monetizable. Searching for information about local businesses, community issues, and government becomes less useful without local journalism to inform your results. Similarly, journalism that focuses on niche topics, breaking news, and investigative reporting is especially useful for applications that want to provide users with up-to-date, relevant, and timely information while combating the scourge of misinformation and low information. considered to be of value. – High quality online content.
Publishers are deeply concerned that AI will exacerbate the trend of zero-click searches that display requested information without directing users to actual news sites. It has been steadily increasing since 2019. A 2022 study found that half of all Google-generated AI searches result in zero clicks, and only a small percentage of Facebook users click through to content in their News Feed.
Equally distressing is that AI companies are building systems based on massive theft of intellectual property and free use of journalistic content. Journalism content is more than just a collection of facts, often collected at great expense to the journalists reporting the news. . Journalism is an important part of many fundamental data sets used to develop and train generative artificial intelligence systems.news supplements half Of the top 10 sites in the training data of the Google dataset used to train some of the most popular large-scale language models (LLMs), nearly the top 25 sites most represented in the Colossal Clean Crawled Corpus occupies half. A snapshot of the open source Common Crawl dataset filtered to preserve high-quality English language sources and discard low-quality, problematic content such as profanity and hate speech.
Even content that is placed behind a paywall and intended to be restricted to paying users exists in LLM and is recycled into the responses it generates. Last year, ChatGPT and Bing were forced to halt a new product partnership after users were able to bypass publisher paywalls. More than half of the 1,159 publishers surveyed asked AI web crawlers to stop scanning their sites, but compliance is voluntary and there are no penalties for ignoring them.
As with search and social media, if AI companies are allowed to further cannibalize content and revenue from the journalism industry, it will drive readers and potential subscribers away from publishers. This will further reduce the revenue generated from subscriptions, advertising, licenses, and affiliates, undermining not only our ability to produce quality journalism but also the underlying business model of our entire industry.
Fortunately, despite protests that it is fair use to freely exploit journalism to develop underlying models and power generative AI applications such as search and content generation, AI companies are already We are beginning to enter into agreements regarding access to content. OpenAI, which is beneficially owned by Microsoft, has licensing agreements with some of the world’s largest journalism organizations, including The Associated Press, Axel Springer, Le Monde and Spanish media conglomerate Prissa, but also several more. is reportedly in talks with Apple. Google. Although the term is largely unknown, many cover the licensing of content, including archives, for a set period of time (two years seems to be the norm) and access to AI tools in newsrooms. It seems that.
But small, niche, minority, investigative and local media are left behind. Part of the reason is that they don’t necessarily understand the value proposition of journalism across the AI value chain, the resources and sway, or power to seek deals. To negotiate effectively.
How we decide to allocate intellectual property rights and whether fair use applies to the development and training of artificial intelligence systems has significant implications. Efforts to require technology platforms to negotiate with news publishers and allow news publishers to collectively bargain over the use of their content could be particularly helpful in this regard.
News media bargaining rules, already in place in Australia and Canada and under consideration in 12 more jurisdictions, including the US and several states, will initially limit the value that news snippets provide to Google Search. It was seen as a way to demand fair compensation. and Meta’s social media platforms. However, as I have stated to the California Senate, the Canadian Parliament, and the Competition Commission of South Africa in public hearings held over the past few months, these also apply to demanding compensation for content scraping and crawling of AI systems. It can and should be used.
The Center for Journalism and Freedom, which I direct, is a global tracker of Technology and Media Fair Compensation Frameworks, which tracks adopted and proposed regulations around the world. None of these explicitly mention the use of news content in large-scale language models or generative AI products, but they do cover scraping and crawling of a news publisher’s website. The former chair of Australia’s Competition Commission and author of the country’s pioneering News Media Bargaining Code similarly urges publishers to leverage existing frameworks to negotiate deals. .
By requiring technology companies to license the use of news publishers’ content through this type of legislation, smaller, local, niche, and non-English speaking news publishers will also be able to negotiate the use of their content and data. This type of journalism is particularly useful for localizing generative search, summarization, content creation, and other applications that leverage journalism to provide more accurate, timely, and relevant results, especially in languages other than English. .
Journalism is also an important source of data for improving the quality of the underlying model. Foundational models are plagued by bias, misinformation, and spam, which makes access to diverse sources of quality factual information even more valuable, especially in low-resource digital languages. Become. Moreover, the quality of data is as important as the quantity of data, so journalism continually provides new and timely human-generated data.
We are at a time now where the news industry needs to come together. As large media conglomerates and major publications strike deals with tech giants, we need to demand a framework that benefits journalism as a public good, not just to line the pockets of corporate owners. That’s why the only way for journalism to survive AI is to double the number of journalists. While adapting to and integrating AI is important for journalism, replacing journalists in newsrooms will hasten its demise and have profound implications for democracy in the United States and around the world.
News organizations need to consider how to optimize revenue streams and assert pricing autonomy across the AI value chain. Different parts of AI systems and applications need to find ways to unlock journalistic value by adopting sophisticated and dynamic reward frameworks and pricing strategies for news content. You need access to information about how your content is used in AI systems, such as datasets and underlying model weights. Government regulations will also be needed to make this possible.


