AI Companies Are Plundering News Sites, and Researchers Have Documented Everything
Artificial Intelligence and journalism are becoming increasingly intertwined, and not always in a fair way.
While big tech companies build multimillion-dollar products using content produced by newsrooms around the world, media outlets continue bearing all the costs of reporting, editing, and publishing without seeing any return for it.
It is like someone filming an entire movie, paying the cast, the director, and the crew, and then someone else comes along, copies everything, and charges admission without passing a single cent back to the people who did the work.
Sounds unfair, right?
Well, that is exactly what new research from McGill University in Montreal, Canada, has documented with hard data. The findings are quite revealing and raise a question the entire industry needs to answer: who is going to foot the bill for journalism in the age of AI? 🤔
What the McGill AI News Audit Revealed
The study, called the AI News Audit, was conducted by professors Taylor Owen and Aengus Bridgman from the Centre for Media, Technology, and Democracy at McGill University. The idea was simple and straightforward: test the major language models to understand how much they know about current news and how much credit they give to the outlets that originally reported those stories.
The researchers tested four widely used AI models — ChatGPT, Gemini, Claude, and Grok — using a sample of 2,267 Canadian news stories. The results showed that these systems are quite well-informed about recent news. However, when web searches were involved, 82% of responses did not include any source attribution. In other words, the information showed up ready and digested, but without any mention of who did the heavy lifting of investigating, fact-checking, and publishing it.
The audit ran two types of tests. The first examined how journalistic content was used to train the AI models. The second analyzed how those models cited news when they incorporated web searches into the answers they delivered to users. This distinction matters because it shows that the problem exists on two separate fronts: both during the building phase and during the usage phase of these systems.
With web search enabled, 52% of responses had at least one link to a Canadian news site, but the source was named in the body of the text only 28% of the time. This means that in most cases, even when a link was tucked away somewhere in the response, the name of the outlet that produced the report simply did not appear clearly for the reader.
A Design Choice, Not a Technical Limitation
One of the most revealing points of the research concerns a technical finding that completely changes the tone of the conversation. When the researchers asked AI models about a story from a specific outlet — mentioning the publication by name in the question itself — the responses identified the source between 74% and 97% of the time.
This demonstrates something crucial: AI companies are technically capable of naming journalistic sources. They simply choose not to do it in most situations. As the audit itself highlights, this is a design choice, not a limitation of the systems.
In an interview, professor Bridgman got straight to the point. He explained that chatbots display journalistic content precisely because it carries accurate and verified information. AI companies recognize the enormous value that journalism provides. These systems are using that material in consumer-facing products, and there should be financial and authorial recognition for that value.
Bridgman also suggested that the links occasionally included in chatbot responses function more as a credibility-building exercise than as a real pathway for readers. Something like saying: trust us, look at our sources. But in practice, most people do not click those links. They take the AI summary and move on, never visiting the site that paid for the reporting.
Paywalls May Not Be Working as Expected
Another concerning finding from the McGill audit was the identification of cases where AI models cited stories that were protected by paywalls — those payment barriers that news sites use to restrict access to subscribers. This suggests that the automated data collection systems used by AI companies may be getting around those barriers in ways that ordinary human readers cannot.
The report notes that paywalls may not be blocking automated retrieval the same way they block human readers. Additional research into this paywall piercing is being conducted at McGill. Other independent studies have already found evidence that the technical protections created by news outlets to prevent data scraping by AI companies are widely ignored.
Bridgman also noted that AI companies use different approaches to answer questions about news. In some cases, they act like a regular person trying to get up to speed on a story. If they hit a paywall, they may back off and look for the same information from free sources scattered across the internet. With the computational power they have, they can piece together enough information from various open sources to deliver the gist of a story, even if the original reporting was locked behind a subscription.
The Cycle That Threatens Local Journalism
Professors Owen and Bridgman summarized the situation quite clearly in their report. According to them, AI companies have built commercial products that depend, in significant part, on the reporting that Canadian journalists produce. And they did so without compensation, without source attribution, and without any obligation to sustain the infrastructure from which they are extracting value. The result is a system that accelerates the economic decline of the very journalism it depends on.
This has very serious practical implications for newsrooms of all sizes, but especially for smaller and local ones that already operate on razor-thin financial margins. A regional outlet covering municipal politics, for example, relies on organic traffic to keep its operation running. If people start asking an AI about what happened at the city council meeting and receive a summary based on what that outlet published — without visiting the site, without generating an ad impression, without contributing to revenue — the financial cycle that sustains that coverage starts to break down.
And when that outlet shuts down, the AI loses a source of relevant local data. But the ones who truly lose are the communities that depended on that journalism to stay informed. This is a concern that has been raised frequently by media experts, who consider local journalism essential for civic literacy and for democracy. 📰
The Role of Legislation: What Canada Has Already Done and What the U.S. Still Has Not
Media legislation is racing to keep up with technology that moves much faster than traditional legislative processes can handle. Some countries have already taken important steps in this direction, and Canada is one of the standouts.
Since 2023, Canada has required tech giants that profit from news to compensate media outlets, through a policy called the Online News Act. Google, for example, began paying 100 million Canadian dollars per year to publishers in the country. Meta, on the other hand, chose a different path: it completely blocked access to news on its platforms in Canada to avoid having to pay. Now, according to recent reports, Meta is reportedly considering paying some outlets, but on the condition that they publicly oppose the very legislation itself. A move that is, shall we say, quite controversial.
After learning about the results of the McGill audit, Canadian Culture Minister Marc Miller stated that the Online News Act is about people paying their fair share and that this principle does not change with the emergence of AI. He pointed out that having news cannibalized and regurgitated undermines the spirit of the original use of that information and that there needs to be a serious conversation with platforms that intend to use this content, including AI companies.
In the United States, the situation is less advanced. A similar policy called the Journalism Competition and Preservation Act (JCPA) had bipartisan support but stalled in Congress in 2023. Since then, there has been no significant progress, despite mounting evidence of unauthorized use of journalistic content by AI systems.
Intellectual Property at the Center of the Debate
The question of intellectual property is not new in the digital world, but it has taken on an entirely different dimension with the rise of large language models. For decades, media outlets have fought legal battles against content aggregators, search engines, and social media platforms that displayed excerpts of stories without paying anything in return.
With generative AI, the problem intensifies because the content is processed, transformed, and synthesized in a way that the trail back to the original source practically disappears. Unlike a link in a search result, where at least there is a visual reference to the source outlet, AI-generated responses tend to present information as though it were a neutral, authorless fact.
Several news organizations around the world have already started taking legal action. Lawsuits filed by publishers have produced evidence similar to the findings from the McGill audit, reinforcing the argument that we are looking at a systematic appropriation of copyrighted content. These lawsuits, combined with academic research like the one from McGill, are expected to pressure AI companies into negotiating compensation agreements. And if that does not happen voluntarily, the expectation is that governments around the world will step in to ensure these companies take responsibility.
The Pirated Movie Analogy
One of the most interesting metaphors circulating in this discussion helps put the problem into perspective. Imagine you wanted to avoid paying for a movie ticket at the theater. You could look for free trailers and clips posted on social media. With powerful computers, it would be possible to stitch all of that together quickly into something that roughly resembled the original film.
Then, if you had no scruples, you could charge people for the service of providing that Frankenstein version of the movie, without paying absolutely anything to the people who wrote, directed, edited, and acted in the original production.
Eventually, there would be no more trailers, clips, or new movies. And that is exactly the risk journalism faces when its content is absorbed by AI systems without any compensation. If the source dries up, AI also runs out of quality material to consume. 💡
News Publishing in the Age of AI: What Changes for Everyone
The way news publishing works is undergoing an unprecedented transformation. Newsrooms were already dealing with the structural decline in ad revenue as budgets migrated to Google and Facebook over the past decade. Now, with the growing use of AI assistants as an entry point for information, the risk of another round of traffic and revenue loss becomes very real.
If before the fight was over clicks on links, now it is over the very relevance of direct access to the news outlet, which can be completely bypassed when an AI delivers a ready-made answer. AI companies end up with the subscription and advertising revenue, instead of the news sites that paid to report, edit, and publish the stories.
Even when links are included in AI-generated summaries, most people simply do not click on them. This means AI companies are enabling users to consume the news without ever visiting the sites that produced it. It is a model that benefits only one side of the equation.
What Lies Ahead
The McGill research is yet another important data point in a conversation that is far from over. Professors Owen and Bridgman have expressed willingness to share their research models with academics in other countries, encouraging the production of similar audits in different markets. The more this issue gains visibility and data-backed support, the greater the chances that balanced solutions will start to emerge in both the legal and technological arenas.
Of course, research like this will not produce definitive answers to all the questions surrounding AI and journalism. But, as the original article from The Seattle Times aptly noted, much like an unscrupulous chatbot, it does manage to give us a pretty clear picture of what is going on.
The debate about source attribution, intellectual property, and media legislation as it applies to AI is not just a corporate discussion between big companies. It directly affects the health of the information ecosystem as a whole, the diversity of voices that reach the public, and the ability of democratic societies to stay well-informed. And the sooner this issue gets treated with the seriousness it deserves, the better for everyone. 🗞️
