How Accurate AI Assistant Are?

What:

A research report by the BBC examining how accurately AI assistants (ChatGPT, Microsoft's Copilot, Google's Gemini, and Perplexity) represent BBC News content when answering questions.

Who:

Research conducted by:

- Oli Elliott, Principal Data Scientist, BBC Responsible AI Team

- Report signed by Pete Archer, Programme Director Generative AI

- Published in February 2025

Main Excerpt:

To better understand the news related output from AI assistants we undertook research into four prominent, publicly available AI assistants – OpenAI’s ChatGPT; Microsoft’s Copilot; Google’s Gemini; and Perplexity. We wanted to know whether they provided accurate responses to questions about the news; and if their answers faithfully represented BBC news stories used as sources. We gave the AI assistants access to our website for the duration of the research and asked them questions about the news, prompting them to use BBC News articles as sources where possible. AI answers were reviewed by BBC journalists, all experts in the question topics, on criteria including accuracy, impartiality and how they represented BBC content.

Main Insights:

The report investigates the accuracy, reliability, and potential risks of AI assistants in delivering news content, specifically focusing on how they handle and represent BBC News content.

AI assistants frequently produce inaccurate or distorted information
They often fail to properly distinguish between facts and opinions
They struggle with proper source attribution
They frequently lack important context in their responses
They can amplify misinformation when shared on social networks
There is no mechanism for AI assistants to correct errors

Golden nugget: 51% of all AI answers to questions about the news were judged to have significant issues of some form, with 19% of AI answers citing BBC content introducing factual errors and 13% of quotes being either altered or not present in the cited articles.

Top important stats & data:

51% of AI responses had significant issues
91% of responses contained at least some issues
19% of AI answers using BBC content had factual errors
13% of quotes were altered or fabricated
Gemini had the highest rate of sourcing errors (45%)
Perplexity cited BBC sources in 100% of responses
ChatGPT and Copilot cited BBC in 70% of responses
Gemini cited BBC in 53% of responses
34% of Gemini responses had significant issues with BBC content representation
27% of Copilot responses had significant issues
17% of Perplexity responses had significant issues
15% of ChatGPT responses had significant issues
23 instances of commentators' opinions presented as facts
10% of responses citing BBC had opinion/fact distinction issues
45 instances of incorrect dates, numbers, and factual statements identified

What:

Who:

Main Excerpt:

Main Insights:

Top important stats & data:

Source:

SUBSCRIBE OUR NEWSLETTER

FROM US

About us

Join us

FOR YOU

Forum

Newsletter

Blog

101 Women

Partners