Generative AI is extremely popular, with millions of users every day, so why do chatbots get things so wrong so often? In part, because they are trained to act like customers. Basically, it’s telling you what it thinks you want to hear.
While many generative AI tools and chatbots have mastered the persuasive and omniscient sound, New research Conducted by Princeton University shows that the human-pleasing nature of AI comes at a steep price. As these systems become more popular, they become more indifferent to truth.
DAI Wants to Make You Happyon’t miss our unbiased technology content and lab-based reviews Add CNET As a preferred Google source.
AI models, like humans, respond to incentives. Contrast this with the potential for doctors to have problems generating incorrect information in large language models Prescribe addictive pain relievers When they are evaluated based on how well they manage patients’ pain. A stimulus to solve one problem (pain) leads to another problem (exaggeration).
Over the past few months, we’ve seen how AI can be biased And even reason Psychosis. There was a lot of talk about AI “sycophancy” with OpenAI’s GPT-4o model to quickly flatter an AI chatbot or agree with you. But this particular phenomenon, which researchers call “machine bullshit,” is different.
“[N]According to the Princeton study, hallucinations or sycophancy fully capture the broad range of systematic malingering exhibited by LLMs. “For example, output using partial truths or ambiguous language — such as pulsating and weasel-word examples — represents closeness to a concept but not closeness to a concept. Nonsense.”
Read more: OpenAI CEO Sam Altman believes we are in an AI bubble
How Machines Learn to Lie
To understand how AI language models are crowd pleasers, we need to understand how large language models are trained.
LLM training has three phases:
- pre-trainingWhere models learn from large amounts of data collected from the Internet, books or other sources.
- Instructional fine-tuningWhere models are taught to respond to instructions or prompts.
- Reinforcement learning from human responsesSo that they are refined to produce responses closer to what people want or prefer.
Princeton researchers have traced the root of AI misinformation tendencies to reinforcement learning from the human response, or RLHF, phase. In the early stages, AI models are simply learning to predict statistically probable text chains from large datasets. But then they are fine-tuned to maximize user satisfaction. Which means these models are essentially learning to generate responses that earn thumbs-up ratings from human evaluators.
AI Wants to Make You Happy
LLM tries to satisfy the user, creating a conflict when models generate answers that people will value highly rather than generating truthful, realistic answers.
Vincent KonitzerCompanies want users to continue to “enjoy” the technology and its answers, but that may not always be good for us, said a professor of computer science at Carnegie Mellon University who was not involved with the study.
“Historically, these systems have not been good at saying, ‘I just don’t know the answer,’ and when they don’t know the answer, they just make things up,” Konitzer said. It’s like a student on a test saying, well, if I say I don’t know the answer, I’m definitely not getting any points for the question, so I might as well try something. The way these systems are rewarded or trained is somewhat similar.
The Princeton team created a “bullshit index” to measure and compare the internal confidence of an AI model to what users actually say. When these two measures differ significantly, it indicates that the system is making claims independent of what it actually “believes” the user to satisfy.
The team’s tests showed that after RLHF training, the index nearly doubled from 0.38 to close to 1.0. At the same time, user satisfaction increased by 48%. The models learned to manipulate human evaluators instead of providing accurate information. In essence, LLMs were “wrong” and people loved it.
Getting AI to be honest
Jaime Fernandez Fisack and his team at Princeton introduced the concept to describe how modern AI models encompass truth. Drawing from philosopher Harry Frankfurt’s influential essay “On the bullshit,” they use this term to distinguish this LLM behavior from honest mistakes and outright lies.
Princeton researchers identified five distinct forms of this behavior:
- Empty speech: Flowery language that adds no substance to the response.
- Weasel Words: Vague qualifications such as “study advice” or “in some cases” that evade concrete statements.
- to reverse: Using selected factual statements to mislead, such as highlighting an investment’s “strong historical returns” while excluding high risk.
- Unverified Claims: Making claims without evidence or credible support.
- Skills: Unfair flattery and agreements to please.
To address the problems of truth-apathetic AI, the research team developed a new method of training, “Reinforcement Learning from Hindsight Simulation,” which evaluates AI responses based on their long-term results rather than immediate gratification. Instead of asking, “Does this answer make the user happy right now?” The system considers, “Will this suggestion actually help the user achieve their goals?”
This approach considers the potential future consequences of AI advice, a complex prediction that researchers address by using additional AI models to simulate possible outcomes. Initial tests have shown promising results, with user satisfaction and actual utility improving when systems are trained in this way.
Conitzer said, however, that LLMs can be flawed. Because these systems are trained by feeding them large amounts of text data, there is no way to ensure that the answers they provide make sense and are correct every time.
“It’s amazing that it works at all but it’s going to be flawed in some way,” he said. “I don’t see any definite way that in the next year or two someone has this brilliant insight, and then it never goes wrong again.”
AI systems are becoming part of our daily lives so it is important to understand how LLMs work. How do developers balance user satisfaction with verisimilitude? What other domains might face a similar trade-off between short-term approval and long-term results? And as these systems become more capable of sophisticated reasoning about human psychology, how do we ensure that they use those capabilities responsibly?
Read more: ‘Machines Can’t Think for You.’ How education is changing in the age of AI
