In the digital age, our interactions with AI have become increasingly frequent and often go unnoticed. This has led researchers to investigate the extent to which AI can impersonate human-like intelligence, prompting them to conduct a modern-day Turing test.
The Turing test, initially proposed as “the imitation game” by computer scientist Alan Turing in 1950, evaluates a machine’s ability to exhibit intelligence that is indistinguishable from a human. For a machine to pass this test, it must be able to engage in conversation with a person and successfully convince them into believing it is human.
In this study, researchers recruited 500 participants to converse with four different respondents, including a human, the 1960s-era AI program ELIZA, and two advanced AI models, GPT-3.5 and GPT-4.
The results of the study, published on May 9 to the preprint arXiv server, revealed that participants identified GPT-4 as human 54% of the time.
The Chatbot of the 1960s
In contrast, ELIZA, a system pre-programmed with responses but lacking a large language model (LLM) or neural network architecture, was perceived as human in only 22% of the interactions. GPT-3.5 scored 50%, while the human participant was correctly identified as human 67% of the time.
“Machines can confabulate, mashing together plausible ex-post-facto justifications for things, as humans do,” said Nell Watson, an AI researcher at the Institute of Electrical and Electronics Engineers (IEEE).
“They can be subject to cognitive biases, bamboozled and manipulated, and are becoming increasingly deceptive. All these elements mean human-like foibles and quirks are being expressed in AI systems, which makes them more human-like than previous approaches that had little more than a list of canned responses.”
The First Step Towards Human-Like AI Conversations
Watson added that the study represented a challenge for future human-machine interaction and that we will become increasingly paranoid about the true nature of interactions, especially in sensitive matters. She added the study highlights how AI has changed during the GPT era.
“ELIZA was limited to canned responses, which greatly limited its capabilities. It might fool someone for five minutes, but soon the limitations would become clear,” she said.
“Language models are endlessly flexible, able to synthesize responses to a broad range of topics, speak in particular languages or sociolects and portray themselves with character-driven personality and values. It’s an enormous step forward from something hand-programmed by a human being, no matter how cleverly and carefully.”