Why these tools make things up, why they sound so sure doing it, and the one habit that protects you.
Ask an AI chatbot a question and you will almost always get an answer. It will be clear, organized, and sure of itself. Most of the time it is right. Some of the time it is making the whole thing up, and it sounds exactly as sure either way. That last part is the problem.
What it means when AI "makes things up"
People who build these tools call it a hallucination. Strip out the jargon and it means the chatbot invents something and states it as fact. A figure, a quote, a court case, a web link, a customer policy. It does not hedge or flag the invention. It hands you a fabrication in the same calm voice it uses for everything that happens to be true.
Two real examples, with real consequences
In 2023, a New York lawyer used ChatGPT to help write a legal brief. The tool produced six court cases that backed his argument perfectly. They had names, citations, and quotes. None of them existed. When he started to doubt them, he asked ChatGPT whether they were real. It told him yes. A judge fined him, a second lawyer, and their firm $5,000, and the case became the warning every law school now teaches.
It is not just lawyers. In the spring of 2025, a syndicated summer reading list ran in real newspapers, including the Chicago Sun-Times. Several of the books on it did not exist. The titles had been invented by AI and attached to real, well-known authors. The writer had used a chatbot for research and had not checked the results before they went to print.
It is not lying. It does not know enough to lie.
It helps to know what these tools actually do. A chatbot is not looking facts up in a database. It predicts the next word in a sentence, over and over, from patterns it learned across an enormous amount of text. It is very good at producing writing that sounds right. There is no separate step where it checks whether what it just said is true, and no real sense of "I do not know this one."
That is why rare, specific facts are the danger zone. A common fact shows up countless times in the training text, so the tool reproduces it easily. An obscure one, a single person's birthday, a small case from a minor court, a niche statistic, may appear once or not at all. Asked for it, the tool does not stop. It fills the gap with something shaped like an answer.
In September 2025, researchers at OpenAI argued that the problem is baked in deeper than bad data. The way these systems are trained and scored rewards confident guessing. Picture a multiple choice exam where a blank earns nothing but a guess might earn a point. Over time the tool learns that guessing beats admitting it is unsure. Other researchers say that is part of the story rather than all of it, but the takeaway holds: these tools are built to always produce an answer, not to tell you when they do not have one.
Newer is not automatically safer
It is tempting to assume the next, smarter version fixes this. It does not always. When OpenAI tested its own newer reasoning models on a benchmark about real people, the newer ones hallucinated more often than the older model, not less. OpenAI said plainly that it was not sure why. Newer models can also do better on other tests, so this is not a straight line in one direction. The point is simpler: do not assume a newer or pricier model has solved the problem for you.
Even the expensive tools do it
Maybe the polished, paid tools have solved it. Stanford researchers checked. They tested AI legal research tools that cost thousands of dollars a month and were marketed as reducing or even eliminating made up answers. The tools were better than a free chatbot. They were not fixed. The purpose built legal tools still gave incorrect or unsupported answers between 17 and 33 percent of the time. A general tool was wrong on about 43 percent of the same legal questions.
The lesson is not that one brand is good and another bad. It is that even the best of them, sold for serious money, miss often enough that you cannot skip the checking.
Your business can be on the hook for it
This is not only a risk for the person typing the question. In early 2024, a man used Air Canada's website chatbot to ask about bereavement fares. The bot described a refund policy that let him claim a discount after the fact. That policy did not exist. The bot made it up. When the airline refused the refund, he took it to a tribunal. Air Canada argued it should not be responsible for what its own chatbot said. The tribunal disagreed and ordered the airline to pay. If you are thinking about putting a chatbot on your own site, that is the line to remember. What the bot tells your customers is on you.
Why the confident tone works on us
There is a human side to this too. Researchers have found that the more detailed and polished an answer looks, the more we trust it, even when the extra length adds nothing accurate. A long, fluent, wrong answer is more dangerous than a short one, because it earns confidence it has not earned on the facts.
Knowing where the risk concentrates makes it easier to stay out of trouble:
Check hard before you trust it
- Specific facts and figures
- Names and dates
- Quotes and statistics
- Citations, sources, and web links
- Legal and medical detail
- Recent events and niche topics
Generally safer, still review
- Rewriting and adjusting tone
- Summarizing a document you provide
- Reorganizing your own notes
- Brainstorming and outlines
- Formatting and cleanup
- Getting past a blank page
The habit that protects you
None of this makes the tools useless. They are good at plenty, and they save real time. It means you handle what they give you a certain way.
Treat AI output as a confident first draft, not as a fact.
A draft is a starting point you expect to check, not a finished answer you forward without reading. In practice:
- Verify anything specific. Names, dates, numbers, prices, quotes, citations, policies. That is where the tool invents most, and where being wrong costs you.
- Tell it that it is allowed to say "I don't know." A simple line like "if you are not sure, say so, do not guess" makes it more willing to stop instead of fabricate.
- Ask for sources, then actually open them. Asking is not proof. The lawyer above had citations that looked real. They were not.
- Watch the "give me five" trap. Ask for five examples when only three exist, and it may invent the other two to fill the order.
- Use it for what it is good at. Drafting, rewriting, summarizing a document you hand it, getting unstuck. Be careful leaning on it for facts you cannot confirm somewhere else.
The tools are not going away, and neither is the confident tone. The fix is not fear. It is the same habit you already use with the acquaintance who is sometimes right and always sure. Nod, then check before you act on it.
Sources
- Kalai, Nachum, Vempala, Zhang. "Why Language Models Hallucinate." OpenAI and Georgia Tech, September 2025. arXiv:2509.04664. arxiv.org/abs/2509.04664
- "Why language models hallucinate." OpenAI, September 5, 2025. openai.com
- OpenAI o3 and o4-mini System Card, April 16, 2025 (PersonQA hallucination rates). cdn.openai.com
- Magesh et al. "Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools." Stanford RegLab and HAI, Journal of Empirical Legal Studies, 2025. reglab.stanford.edu
- Mata v. Avianca, Inc., 678 F. Supp. 3d 443 (S.D.N.Y. 2023). Sanction ruling, June 22, 2023. CNBC report
- Moffatt v. Air Canada. B.C. Civil Resolution Tribunal, February 2024. CBC News
- Fake AI-generated summer reading list. Associated Press, May 2025 (also reported by the Chicago Sun-Times).
- Steyvers and Kumar, on the gap between perceived and actual AI accuracy. Nature Machine Intelligence, 2025.
- "Reduce hallucinations." Anthropic (Claude) documentation.
Ready to put AI to work in your business?