When ChatGPT speaks, does it think?
What happens when someone turns to an AI for emotional support, thinking it understands their pain? What happens when AI-generated misinformation spreads because it sounds trustworthy?
One of the most famous questions in the history of science is whether machines can think. When Alan Turing posed it in 1950, he wasn’t just trying to stir debate. He imagined a future in which computers might one day behave like people. To tackle this big question, he came up with a practical test. If a machine could chat with a human without that person realising they were talking to a machine, then perhaps that machine could be considered “intelligent”.
This simple challenge, known as the Turing Test, has guided artificial intelligence research for over 70 years. But today, with AI systems like GPT-4.5 reportedly outperforming humans in some of these tests, we’re faced with a deeper question: Does sounding human mean being intelligent?
To understand the spirit behind the Turing Test, we must go back to Turing’s 1950 paper Computing Machinery And Intelligence. In it, Turing introduced what he called the “imitation game”. This game initially involved three participants—a man, a woman, and a judge. The judge, isolated from the others, communicated via typewritten messages and had to guess which was the man and which was the woman. Turing then suggested a twist—replace one of the people with a machine. If the judge still couldn’t tell who was who, could we say that the machine was thinking? Turing’s genius lay in moving the conversation away from abstract definitions of ‘thought’ or ‘intelligence’ and toward observable behaviour. What mattered wasn’t how the machine worked inside but whether it could imitate human conversation convincingly enough to fool someone on the outside.
The change in how we define intelligence—from focusing on biological traits to emphasising behaviour—has significantly impacted research for many years. But progress was slow. For much of the 20th century, no machine came close to passing the test. This led to the creation of the Loebner Prize in 1990 by Dr Hugh Loebner. Frustrated by the lack of breakthroughs, he launched the first formal competition based on the Turing Test, hoping to jump-start progress in natural language AI. A gold medal and a $100,000 prize were offered to the first programme to pass an unrestricted test version. Until that happened, smaller annual prizes were awarded to the chatbot judged most human-like in short conversations. Yet despite decades of contests, the grand prize was never claimed. No machine ever truly passed the full Turing Test under rigorous conditions, though some came close by adopting clever personas or using scripted responses.
One of the most talked-about bots was Eugene Goostman, who claimed to be a 13-year-old Ukrainian boy. This persona cleverly excused its occasional grammar mistakes and strange responses. Judges sometimes assumed its quirks were human. Other successful entrants like Mitsuku, Cleverbot, and Rose relied on large databases of phrases or background stories to carry out conversations. But their success was limited. Often, these programmes dodged tough questions, repeated information, or gave odd replies that revealed their artificial nature. While entertaining, they never convincingly demonstrated genuine understanding.
All of that changed with the arrival of today’s large language models, like GPT-4.5. These systems are trained on massive amounts of text and can generate human-like responses across an astonishing range of topics. In a recent academic study, GPT-4.5 reportedly passed a three-party Turing Test, being mistaken for a human 73% of the time—more often than the actual humans taking part. The secret? Researchers instructed the model to adopt the voice of a socially awkward young adult who used casual slang and made occasional slip-ups. This characterisation made the AI seem more relatable and less robotic. However, when the same model was tested without that persona, its success rate dropped to 36%.
These results raise an important question: Has the Turing Test been passed in spirit or only in appearance? The answer depends on what we mean by ‘intelligence’. While GPT-4.5 can produce remarkably coherent, often insightful language, it doesn’t understand what it’s saying. It doesn’t have beliefs, emotions, goals, or consciousness. It can talk about love, grief, or climate change with poetic clarity—but it doesn’t experience or comprehend any of these things.
Philosopher John Searle had conceived the now-famous Chinese Room argument to clarify this. He imagined a person locked in a room without knowledge of Chinese, given a rulebook to manipulate Chinese symbols in response to questions. From the outside, the replies appear fluent. But the person inside doesn’t understand Chinese—they’re just following instructions. Searle’s point is that a machine might simulate understanding without having it. And that’s exactly what systems like GPT do—they predict words based on patterns, not meaning. They’re not thinking, they’re imitating.
A helpful analogy is this: An aeroplane flies, but it isn’t a bird. A submarine swims, but it isn’t a fish. They achieve similar results using completely different methods. The same is true for AI. It can produce language resembling human thought, but it doesn’t think like we do. It lacks self-awareness, context, and experience. And yet, it may outperform humans in many tasks—just as a plane can fly faster than any bird.
This difference matters. If we mistake fluent language for genuine thought, we risk giving machines roles they’re not suited to fill. Chatbots are already being used in customer service, therapy, and education. What happens when someone turns to an AI for emotional support, thinking it understands their pain? What happens when AI-generated misinformation spreads because it sounds trustworthy? The more realistic AI becomes, the more critical it is to remember that sounding human is not the same as being human.
This also calls for new ways to measure AI progress. For all its brilliance, the Turing Test may no longer be enough. It tells us how well a machine imitates us but not whether it truly understands, reasons, or behaves ethically. Future standards may need to be tested for judgement, values, or emotional intelligence, not just verbal fluency.
So, where does this leave us? We’ve built machines that can talk, answer questions, and even joke with us. Some can now outperform people in sounding human. But that doesn’t make them human. It doesn’t even make them intelligent in the way we are. The Turing Test asked whether a machine could convince us it was a person. Increasingly, the answer is yes. But perhaps the real test now is whether we can remember the difference.
-
National
Ġustizzja għal Artna campaigners: ‘Thanks to the public for the support, our fight continues’
-
Court & Police
Drug ring operating from Marsa stables dismantled in police operation
-
Court & Police
Man acquitted over Sliema road rage incident after court hears he was attacked
More in News-
Business News
APS Bank delivers strong profits and opens Rights Issue
-
Business News
MFSA champions global cooperation in digital operational resilience during the Cyber Finance Summit 2025
-
Business News
Central Bank of Malta Governor participates in the 2025 IMF/WBG Annual Meetings
More in Business-
Football
Maya Lucia joins Cypriot Apollon Ladies
-
Football
Hibernians FC terminates Justin Haber contract after guilty verdict in sexual harassment case
-
Other Sports
Malta's Taekwondo team wins big at the European Games of Small States
More in Sports-
Books
Noah Fabri | I am drawn to realism as a genre and form
-
Cultural Diary
My essentials: Gabrielle Sargent’s cultural picks
-
Books
Malta Book Festival to be held in November under the theme ‘Writing anew’
More in Arts-
Opinions
Condominium administrators, unprotected landlords and laws with loopholes
-
Editorial
Budget 2026 | Budget boon for families but will it lead to baby boom?
-
Opinions
It’s simple as that
More in Comment