top of page

AI Marketing Tools & Tests

Public·8 members

How Accurate AI Assistant Are?


What:


A research report by the BBC examining how accurately AI assistants (ChatGPT, Microsoft's Copilot, Google's Gemini, and Perplexity) represent BBC News content when answering questions.


Who:


Research conducted by:

- Oli Elliott, Principal Data Scientist, BBC Responsible AI Team

- Report signed by Pete Archer, Programme Director Generative AI

- Published in February 2025


Main Excerpt:


To better understand the news related output from AI assistants we undertook research into four prominent, publicly available AI assistants – OpenAI’s ChatGPT; Microsoft’s Copilot; Google’s Gemini; and Perplexity. We wanted to know whether they provided accurate responses to questions about the news; and if their answers faithfully represented BBC news stories used as sources. We gave the AI assistants access to our website for the duration of the research and asked them questions about the news, prompting them to use BBC News articles as sources where possible. AI answers were reviewed by BBC journalists, all experts in the question topics, on criteria including accuracy, impartiality and how they represented BBC content.

Main Insights:


The report investigates the accuracy, reliability, and potential risks of AI assistants in delivering news content, specifically focusing on how they handle and represent BBC News content.


  • AI assistants frequently produce inaccurate or distorted information

  • They often fail to properly distinguish between facts and opinions

  • They struggle with proper source attribution

  • They frequently lack important context in their responses

  • They can amplify misinformation when shared on social networks

  • There is no mechanism for AI assistants to correct errors


Golden nugget: 51% of all AI answers to questions about the news were judged to have significant issues of some form, with 19% of AI answers citing BBC content introducing factual errors and 13% of quotes being either altered or not present in the cited articles.


Top important stats & data:


  • 51% of AI responses had significant issues

  • 91% of responses contained at least some issues

  • 19% of AI answers using BBC content had factual errors

  • 13% of quotes were altered or fabricated

  • Gemini had the highest rate of sourcing errors (45%)

  • Perplexity cited BBC sources in 100% of responses

  • ChatGPT and Copilot cited BBC in 70% of responses

  • Gemini cited BBC in 53% of responses

  • 34% of Gemini responses had significant issues with BBC content representation

  • 27% of Copilot responses had significant issues

  • 17% of Perplexity responses had significant issues

  • 15% of ChatGPT responses had significant issues

  • 23 instances of commentators' opinions presented as facts

  • 10% of responses citing BBC had opinion/fact distinction issues

  • 45 instances of incorrect dates, numbers, and factual statements identified


Source:








38 Views

Looks to me like Perplexity, which I'm using the most for my general queries, is producing the least concerning responses (and I can always click on the links to its sources to read more about what they say).

About

Share your tips, findings, tests, and experiments with AI Ma...

bottom of page