OpenAI’s new reasoning AI models hallucinate more

Karla T Vasquez

12 months ago

The recently launched O3 and O4-Mini AI models of Openai are sophisticated in many ways. However, new models still make hallucinates or make things – in reality they do hallucinates More More than a number of old models in the Openai.

Hallucinations have proven to be one of the largest and most difficult problems that are resolved in AI, even affecting today’s best-performer systems. Histically, each new model has improved somewhat in the hallucination category, less than the predecessor. However, it does not seem to be in the case of O3 and O4-Mini.

According to the internal examination of the opeina, O3 and O4-Mini, which are the so-called rational models, hallucinate More often The company’s previously rational models-o 1-o 1-minute, and 3-minit-papaya Openai’s Traditional, “Non-Rencing” models such as GPT -4O.

Probably about more, the creator of the chatzipt really doesn’t know why it is happening.

In its technical report O3 and O4-MiniOpina writes that “further research needs to be done” to understand why hallucinations are getting worse because it scales rational models. O3 and O4-Mini do better perform in some cases including coding and mathematical works. However, they often managed to make “more accurate/hallucinated claims” in the report as “demanding overall more”.

Open is invented that 33% of the 33% of the questions in 3 Personki and 3 hallucinated, the internal criteria for the organization to measure the accuracy of a model about people. This is about twice the hallucination rate of O1 and O3-Mini, which scored 16% and 14.8% respectively, respectively. O4-Mini did a bad job in private -48% of the time hallucinet.

Third party Examination By Translues, a non -profit AI Research Lab, also found evidence that there is a tendency to take steps to reach the answers to the 3. In an example, Transluus observation and 3 claimed that it was a 2021 MacBook Pro “Out of Chatzipt” code, then the numbers copied in its reply. Although there are access to some of the 3 to 3, it cannot do it.

“Our hypothesis is the subject of learning reinforcement used for O-Series models, usually after training (but not fully erased),” can widely widen the subject, “Techcranch is a translus researcher and former OpenAI employee Neil Chowdhury in an email.

Sara Shoetman, the co-founder of the Translues, has added that the hallucination rate of 3 can be less effective than it else.

Stanford Adjust Prof and Upsky Startup Workier CEO Kian Katanforush TechCrunch told that his team was already experimenting in their coding workflies and 3, and they thought it was one step above the competition. However, Katanforush says that the links to the 3 broken websites tend to have hallucinates. The model will provide a link that does not work when clicking.

Hallucinations can help models reach attractive ideas and help them be creative in their “thought”, but they also make some models a strong sale for business in such markets where accuracy is universal. For example, a law firm will probably not be satisfied with a model that inserts a large amount of errors in the client agreement.

A committed method to enhance the accuracy of the models provides the ability to search the web. Web Search achieves OpenAI GPT -4O 90% accuracy In Simploki. Possibly, search can also improve the hallucination rates of rational models-at least in case of users who are willing to publish the requests to the third-party search supplier.

If the rational models are actually worsening the hallucinations, it will look for a more urgent solution.

“Addressing all our models is an ongoing research field addressed, and we are constantly working to improve their accuracy and reliability,” OpenAI spokesman Nico Felix told TechCrunch in an email.

In the past year, the expaired AI industry began showing reduced returns to focus on rational models after strategies to improve the Traditional Traditional AI models. During training, a lot of computing and data improved the model performance in various activities without need. Nevertheless it seems that logic can be even more confusing – presents a challenge.

Related Posts