
AI tools and news just don't seem to mix -- even at the premium tier.
New research from Columbia's Tow Center for Digital Journalism found that several AI chatbots often misidentify news articles, present incorrect information without any qualification, and fabricate links to news articles that don't exist. The findings build on initial research Tow published in November, which showed ChatGPT Search misrepresenting content from publishers with little to no awareness it might be wrong.
Also: This new AI benchmark measures how much models lie
The trend isn't new. Last month, BBC found that ChatGPT, Gemini, Copilot, and Perplexity chatbots struggled to summarize news stories accurately, instead delivering "significant inaccuracies" and "distortions."
Moreover, the Tow report found new evidence that many AI chatbots can access content from sites that block its crawlers. Here's what to know and which models prove the least reliable.
Failing to identify news articles
Tow researchers randomly chose 10 articles each from 20 publishers. They queried eight chatbots with article excerpts, asking the AI to return the headline, publisher, date, and URL of the corresponding article.
Also: Gemini might soon have access to your Google Search history - if you let it
"We deliberately chose excerpts that, if pasted into a traditional Google search, returned the original source within the first three results," the researchers note.
After running the 1,600 queries, researchers ranked chatbot responses based on how accurately they retrieved the article, publisher, and URL. The chatbots returned wrong answers to over 60% of the queries. Within that, results varied by chatbot: Perplexity got 37% of the queries wrong, while Grok 3 weighed in at 94% errors.
Why does this matter? If chatbots are worse than Google at correctly retrieving news, they can't necessarily be relied upon to interpret and cite that news -- which makes the content of their responses, even when linked, much more dubious.
Confidently giving wrong answers
Researchers note the chatbots returned wrong answers with "alarming confidence," tending not to qualify their results or admit to knowledge gaps. ChatGPT "never declined to provide an answer," despite 134 of its 200 responses being incorrect. Out of all eight tools, Copilot declined to answer more queries than it responded to.
"All of the tools were consistently more likely to provide an incorrect answer than to acknowledge limitations," the report clarifies.
Paid tiers aren't more reliable
While premium models like Grok-3 Search and Perplexity Pro answered more correctly than free versions, they still gave wrong answers more confidently -- which calls into question the value of their often-astronomical subscription costs.
"This contradiction stems primarily from [the bots'] tendency to provide definitive, but wrong, answers rather than declining to answer the question directly," the report explains. "The fundamental concern extends beyond the chatbots' factual errors to their authoritative conversational tone, which can make it difficult for users to distinguish between accurate and inaccurate information."
Also: Don't trust ChatGPT Search and definitely verify anything it tells you
"This unearned confidence presents users with a potentially dangerous illusion of reliability and accuracy," the report added.
Fabricating links
AI models are known to hallucinate regularly. But while all chatbots hallucinated fake articles in their responses, Tow found that Gemini and Grok 3 did so the most -- more than half the time. "Even when Grok correctly identified an article, it often linked to a fabricated URL," the report notes, meaning that Grok could find the right title and publisher, but then manufactured the actual article link.
An analysis of Comscore traffic data by Generative AI in the Newsroom, a Northwestern University initiative, confirms this pattern. Their study of data from July to November 2024 showed that ChatGPT generated 205 broken URLs in its responses. While publications do occasionally take down stories, which can result in 404 errors, researchers noted that based on a lack of archival data, it was "likely that the model has hallucinated plausible-looking links to authoritative news outlets when responding to user queries."
Also: This absurdly simple trick turns off AI in your Google Search results
The findings are troubling, given the growing adoption of AI search engines. While they still haven't replaced traditional search engines, Google released AI Mode last week, which replaces its normal search with a chatbot (despite the rampant unpopularity of its AI Overviews). Considering some 400 million users flock to ChatGPT weekly, the unreliability and distortion of its citations make ChatGPT and other popular AI tools potential engines of misinformation, even as they pull work from credited, rigorously fact-checked news sites.
The Tow report concluded that AI tools mis-crediting sources or incorrectly representing their work could backfire on the publishers' reputations.
Ignoring blocked crawlers
The news gets worse for publishers: Columbia's Tow report found that several chatbots could still retrieve articles from publishers that had blocked their crawlers using Robots Exclusion Protocol (REP), or robots.txt. Paradoxically, however, chatbots failed to correctly answer queries about sites that allow them to access their content.
"Perplexity Pro was the worst offender in this regard, correctly identifying nearly a third of the ninety excerpts from articles it should not have had access to," the report states.
Also: AI agents aren't just assistants: How they're changing the future of work today
This suggests that not only are AI companies still ignoring REP -- as Perplexity and others were caught doing last year -- but that publishers in any kind of licensing agreement with them aren't guaranteed to be correctly cited.
Columbia's report is just one symptom of a larger problem. The Generative AI in the Newsroom report also discovered that chatbots rarely direct traffic to the news sites they're extracting information (and, thus, human labor) from, which other reports also confirm. From July to November 2024, Perplexity passed on 7% of referrals to news sites, while ChatGPT passed on just 3%. In comparison, AI tools tended to favor educational resources like Scribd.com, Coursera, and those attached to universities, sending as much as 30% of traffic their way.
The bottom line: Original reporting is still a more reliable news source than what AI tools regurgitate. Be sure to check all links before accepting what they tell you as fact, and remember to use your own critical thinking and media literacy skills to evaluate responses.