News

Students Using ChatGPT Score Worse Despite Solving More Problems

Researchers compared math improvement of 1,000 high school students.

Students Using ChatGPT Score Worse Despite Solving More Problems
Avatar of Cristian G. Guasch

Cristian G. Guasch

  • September 11, 2024
  • Updated: July 1, 2025 at 10:58 PM
Students Using ChatGPT Score Worse Despite Solving More Problems

A recent study by University of Pennsylvania researchers found some surprising results when using ChatGPT as a study assistant. High school students who used ChatGPT to practice math problems solved 48% more problems correctly during practice sessions. However, when it came to test time, they scored 17% lower than their peers who didn’t have access to the AI.

The study involved almost 1,000 Turkish students and also tested a more advanced version of ChatGPT that acted as a tutor by giving hints instead of answers.

This AI tutor had students solve 127% more practice problems correctly than those without any tech aids. But despite the improved practice performance, the students didn’t do better on the test. Those who practiced the old-fashioned way, without AI, matched the scores of their tech-assisted peers.

The study shows a big problem: students using ChatGPT treat it as a “crutch” and ask the chatbot for the answers instead of building problem-solving skills for independent learning. This reliance on AI tools seems to hinder actual learning gains.

ChatGPT’s problem-solving methods also had errors. It got 50% of the math problems correct, and 42% of the step-by-step solutions were wrong. Even when arithmetic was correct, it was wrong 8% of the time. The fine-tuned tutoring version of ChatGPT which was given the correct solutions minimized these errors but didn’t improve overall test scores.

To make things more complicated, the study found that students overestimated their learning. Surveys conducted alongside the experiment showed that those who used ChatGPT or the AI tutor thought they did better on the test than they actually did. This overconfidence is like pilots relying on autopilot. It highlights the risks of over-reliance on AI in education.

Despite these findings, the researchers say more studies are needed to confirm. The experiment was done in fall 2023 and was with Turkish students in grades 9-11 and hasn’t been peer-reviewed. But it’s an early look into the potential downsides of using freely available AI chatbots like ChatGPT in the classroom.

Technology like calculators and computers have always been in education. ChatGPT introduces a new tradeoff: students can answer more problems correctly in practice but learn less in the long run.

Solving one problem with AI doesn’t mean they can solve the next.

Latest Articles

Loading next article