Franz Broseph seemed like any other Diplomacy player to Claes de Graaff. The handle was a joke — Austrian Emperor Franz Joseph I reborn as an online bro — but that was the kind of humor that people who play Diplomacy tend to enjoy. The game is a classic, beloved by the likes of John F. Kennedy and Henry Kissinger, combining military strategy with political intrigue as it re-creates the First World War: Players negotiate with allies, enemies and everyone in between as they plan how their armies will move across 20th-century Europe.
When Franz Broseph joined a 20-player online tournament at the end of August, he wooed other players, lying to them and ultimately betraying them. He finished in first place.
De Graaff, a chemist living in the Netherlands, finished fifth. He had spent nearly 10 years playing Diplomacy, both online and at face-to-face tournaments across the globe. He did not realize until it was revealed several weeks later that he had lost to a machine. Franz Broseph was a bot.
“I was flabbergasted,” de Graaff, 36, said. “It seemed so genuine — so lifelike. It could read my texts and converse with me and make plans that were mutually beneficial — that would allow both of us to get ahead. It also lied to me and betrayed me, like top players frequently do.”
Built by a team of artificial intelligence researchers from the tech giant Meta, the Massachusetts Institute of Technology and other prominent universities, Franz Broseph is among the new wave of online chatbots that are rapidly moving machines into new territory.
When you chat with these bots, it can feel like chatting with another person. It can feel, in other words, like machines have passed a test that was supposed to prove their intelligence.
For more than 70 years, computer scientists have struggled to build technology that could pass the Turing test: the technological inflection point where we humans are no longer sure whether we are chatting with a machine or a person. The test is named for Alan Turing, the famed British mathematician, philosopher and wartime code breaker who proposed the test back in 1950. He believed it could show the world when machines had finally reached true intelligence.
The Turing test is a subjective measure. It depends on whether the people asking the questions feel convinced that they are talking to another person when in fact they are talking to a device.
But whoever is asking the questions, machines will soon leave this test in the rearview mirror.
Bots like Franz Broseph have already passed the test in particular situations, like negotiating Diplomacy moves or calling a restaurant for dinner reservations. ChatGPT, a bot released in November by OpenAI, a San Francisco lab, leaves people feeling as if they were chatting with another person, not a bot. The lab said more than a million people had used it. Because ChatGPT can write just about anything, including term papers, universities are worried it will make a mockery of class work. When some people talk to these bots, they even describe them as sentient or conscious, believing that machines have somehow developed an awareness of the world around them.
Privately, OpenAI has built a system, GPT-4, that is even more powerful than ChatGPT. It may even generate images as well as words.
And yet these bots are not sentient. They are not conscious. They are not intelligent — at least not in the way that humans are intelligent. Even people building the technology acknowledge this point.
“These systems can do a lot of useful things,” said Ilya Sutskever, chief scientist at OpenAI and one of the most important AI researchers of the past decade, referring to the new wave of chatbots. “On the other hand, they are not there yet. People think they can do things they cannot.”
As the latest technologies emerge from research labs, it is now obvious — if it was not obvious before — that scientists must rethink and reshape how they track the progress of artificial intelligence. The Turing test is not up to the task.
Time and time again, AI technologies have surpassed supposedly insurmountable tests, including mastery of chess (1997), “Jeopardy!” (2011), Go (2016) and poker (2019). Now it is surpassing another, and again this does not necessarily mean what we thought it would.
We — the public — need a new framework for understanding what AI can do, what it cannot, what it will do in the future and how it will change our lives, for better or for worse.
Five years ago, Google, OpenAI and other AI labs started designing neural networks that analyzed enormous amounts of digital text, including books, news stories, Wikipedia articles and online chat logs. Researchers call them “large language models.” Pinpointing billions of distinct patterns in the way people connect words, letters and symbols, these systems learned to generate their own text.
Six months before releasing its chatbot, OpenAI unveiled a tool called DALL-E.
A nod to both “WALL-E,” the 2008 animated movie about an autonomous robot, and Salvador Dalí, the surrealist painter, this experimental technology lets you create digital images simply by describing what you want to see. This is also a neural network, built much like Franz Broseph or ChatGPT. The difference is that it learned from both images and text. Analyzing millions of digital images and the captions that described them, it learned to recognize the links between pictures and words.
This is what’s known as a multimodal system. Google, OpenAI and other organizations are already using similar methods to build systems that can generate video of people and objects. Startups are building bots that can navigate software apps and websites on a user’s behalf.
These are not systems that anyone can properly evaluate with the Turing test — or any other simple method. Their end goal is not conversation.
Turing’s test judged whether a machine could imitate a human. This is how artificial intelligence is typically portrayed — as the rise of machines that think like people. But the technologies under development today are very different from you and me. They cannot deal with concepts they have never seen before. And they cannot take ideas and explore them in the physical world.
At the same time, there are many ways these bots are superior to you and me. They do not get tired. They do not let emotion cloud what they are trying to do. They can instantly draw on far larger amounts of information. And they can generate text, images and other media at speeds and volumes we humans never could.
Their skills will also improve considerably in the coming years.
In the months and years to come, these bots will help you find information on the internet. They will explain concepts in ways you can understand. If you like, they will even write your tweets, blog posts and term papers.
They will tabulate your monthly expenses in your spreadsheets. They will visit real estate websites and find houses in your price range. They will produce online avatars that look and sound like humans. They will make mini-movies, complete with music and dialogue.
Certainly, these bots will change the world. But the onus is on you to be wary of what these systems say and do, to edit what they give you, to approach everything you see online with skepticism. Researchers know how to give these systems a wide range of skills, but they do not yet know how to give them reason or common sense or a sense of truth.
That still lies with you.
This article originally appeared in The New York Times.
Get more business news by signing up for our Economy Now newsletter.