AI Said It With Total Confidence. It Was Also Wrong.

Why these tools make things up, why they sound so sure doing it, and the one habit that protects you.

Ask an AI chatbot a question and you will almost always get an answer. It will be clear, organized, and sure of itself. Most of the time it is right. Some of the time it is making the whole thing up, and it sounds exactly as sure either way. That last part is the problem.

What it means when AI "makes things up"

People who build these tools call it a hallucination. Strip out the jargon and it means the chatbot invents something and states it as fact. A figure, a quote, a court case, a web link, a customer policy. It does not hedge or flag the invention. It hands you a fabrication in the same calm voice it uses for everything that happens to be true.

Two real examples, with real consequences

In 2023, a New York lawyer used ChatGPT to help write a legal brief. The tool produced six court cases that backed his argument perfectly. They had names, citations, and quotes. None of them existed. When he started to doubt them, he asked ChatGPT whether they were real. It told him yes. A judge fined him, a second lawyer, and their firm $5,000, and the case became the warning every law school now teaches.

It is not just lawyers. In the spring of 2025, a syndicated summer reading list ran in real newspapers, including the Chicago Sun-Times. Several of the books on it did not exist. The titles had been invented by AI and attached to real, well-known authors. The writer had used a chatbot for research and had not checked the results before they went to print.

It is not lying. It does not know enough to lie.

It helps to know what these tools actually do. A chatbot is not looking facts up in a database. It predicts the next word in a sentence, over and over, from patterns it learned across an enormous amount of text. It is very good at producing writing that sounds right. There is no separate step where it checks whether what it just said is true, and no real sense of "I do not know this one."

That is why rare, specific facts are the danger zone. A common fact shows up countless times in the training text, so the tool reproduces it easily. An obscure one, a single person's birthday, a small case from a minor court, a niche statistic, may appear once or not at all. Asked for it, the tool does not stop. It fills the gap with something shaped like an answer.

In September 2025, researchers at OpenAI argued that the problem is baked in deeper than bad data. The way these systems are trained and scored rewards confident guessing. Picture a multiple choice exam where a blank earns nothing but a guess might earn a point. Over time the tool learns that guessing beats admitting it is unsure. Other researchers say that is part of the story rather than all of it, but the takeaway holds: these tools are built to always produce an answer, not to tell you when they do not have one.

Newer is not automatically safer

It is tempting to assume the next, smarter version fixes this. It does not always. When OpenAI tested its own newer reasoning models on a benchmark about real people, the newer ones hallucinated more often than the older model, not less. OpenAI said plainly that it was not sure why. Newer models can also do better on other tests, so this is not a straight line in one direction. The point is simpler: do not assume a newer or pricier model has solved the problem for you.

Newer reasoning models hallucinated more, not less

OpenAI PersonQA test · higher is worse

Hallucination rate on OpenAI's PersonQA test, which asks about real people. Source: OpenAI o3 and o4-mini system card, April 2025. o3-mini and o1 figures from the same OpenAI reporting.

Even the expensive tools do it

Maybe the polished, paid tools have solved it. Stanford researchers checked. They tested AI legal research tools that cost thousands of dollars a month and were marketed as reducing or even eliminating made up answers. The tools were better than a free chatbot. They were not fixed. The purpose built legal tools still gave incorrect or unsupported answers between 17 and 33 percent of the time. A general tool was wrong on about 43 percent of the same legal questions.

Even purpose-built legal AI still gets it wrong

Share of legal questions with a made-up or unsupported answer

Source: Stanford RegLab and HAI, "Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools." Tools tested 2024, published 2025.

The lesson is not that one brand is good and another bad. It is that even the best of them, sold for serious money, miss often enough that you cannot skip the checking.

Your business can be on the hook for it

This is not only a risk for the person typing the question. In early 2024, a man used Air Canada's website chatbot to ask about bereavement fares. The bot described a refund policy that let him claim a discount after the fact. That policy did not exist. The bot made it up. When the airline refused the refund, he took it to a tribunal. Air Canada argued it should not be responsible for what its own chatbot said. The tribunal disagreed and ordered the airline to pay. If you are thinking about putting a chatbot on your own site, that is the line to remember. What the bot tells your customers is on you.

Why the confident tone works on us

There is a human side to this too. Researchers have found that the more detailed and polished an answer looks, the more we trust it, even when the extra length adds nothing accurate. A long, fluent, wrong answer is more dangerous than a short one, because it earns confidence it has not earned on the facts.

Knowing where the risk concentrates makes it easier to stay out of trouble:

Check hard before you trust it

Specific facts and figures
Names and dates
Quotes and statistics
Citations, sources, and web links
Legal and medical detail
Recent events and niche topics

Generally safer, still review

Rewriting and adjusting tone
Summarizing a document you provide
Reorganizing your own notes
Brainstorming and outlines
Formatting and cleanup
Getting past a blank page

The habit that protects you

None of this makes the tools useless. They are good at plenty, and they save real time. It means you handle what they give you a certain way.

Treat AI output as a confident first draft, not as a fact.

A draft is a starting point you expect to check, not a finished answer you forward without reading. In practice:

Verify anything specific. Names, dates, numbers, prices, quotes, citations, policies. That is where the tool invents most, and where being wrong costs you.
Tell it that it is allowed to say "I don't know." A simple line like "if you are not sure, say so, do not guess" makes it more willing to stop instead of fabricate.
Ask for sources, then actually open them. Asking is not proof. The lawyer above had citations that looked real. They were not.
Watch the "give me five" trap. Ask for five examples when only three exist, and it may invent the other two to fill the order.
Use it for what it is good at. Drafting, rewriting, summarizing a document you hand it, getting unstuck. Be careful leaning on it for facts you cannot confirm somewhere else.

The tools are not going away, and neither is the confident tone. The fix is not fear. It is the same habit you already use with the acquaintance who is sometimes right and always sure. Nod, then check before you act on it. Building that check into how your team works is what an AI Strategy engagement is for. Tell us what is slowing you down.

Sources

Kalai, Nachum, Vempala, Zhang. "Why Language Models Hallucinate." OpenAI and Georgia Tech, September 2025. arXiv:2509.04664. arxiv.org/abs/2509.04664
"Why language models hallucinate." OpenAI, September 5, 2025. openai.com
OpenAI o3 and o4-mini System Card, April 16, 2025 (PersonQA hallucination rates). cdn.openai.com
Magesh et al. "Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools." Stanford RegLab and HAI, Journal of Empirical Legal Studies, 2025. reglab.stanford.edu
Mata v. Avianca, Inc., 678 F. Supp. 3d 443 (S.D.N.Y. 2023). Sanction ruling, June 22, 2023. CNBC report
Moffatt v. Air Canada. B.C. Civil Resolution Tribunal, February 2024. CBC News
Fake AI-generated summer reading list. Associated Press, May 2025 (also reported by the Chicago Sun-Times).
Steyvers and Kumar, on the gap between perceived and actual AI accuracy. Nature Machine Intelligence, 2025.
"Reduce hallucinations." Anthropic (Claude) documentation.

Ready to put AI to work in your business?

No cost to talk it through. Let's figure out what's possible.

Call (423) 967-5584 Contact Online

Back to From the Lab

AI Said It With Total Confidence. It Was Also Wrong.

What it means when AI "makes things up"

Two real examples, with real consequences

It is not lying. It does not know enough to lie.

This lands in your inbox every week.

Practical AI tips, straight to your inbox.