ChatGPT gets crushed at chess by a 1 MHz Atari 2600

Editor’s take: Despite being hailed as the next step in the evolution of artificial intelligence, large language models are no smarter than a piece of rotten wood. Every now and then, some odd experiment or test reminds everyone that so-called “intelligent” AI doesn’t actually exist if you’re living outside a tech company’s quarterly report.

A cycle-exact emulation of the Atari 2600 CPU running at a meager 1.19 MHz is more than enough to utterly humiliate ChatGPT in a game of chess. Citrix engineer Robert Jr. Caruso conducted the “funny” little experiment over the weekend, pitting OpenAI’s mighty chatbot against a virtual Atari 2600 console emulated by Stella. It didn’t end well for the chatbot.

Caruso reportedly got the idea from ChatGPT itself, after chatting with the bot about the history of AI and chess. OpenAI’s service volunteered to play “Atari Chess,” which Caruso assumed referred to Video Chess – the only chess title ever released for the Atari 2600.

Despite being given a basic layout of the board to identify the pieces, ChatGPT struggled. The bot confused rooks for bishops, missed obvious pawn forks, and made a series of baffling blunders, according to Caruso. At one point, ChatGPT even blamed external factors like the abstract symbols used by Video Chess to depict the pieces for its inability to keep track of the game state.

“For 90 minutes, I had to stop it from making awful moves and correct its board awareness multiple times per turn,” the engineer said about ChatGPT’s performance against an emulated CPU console from the 70s.

The bot apparently kept asking to restart the game in hopes of improving its performance, but was ultimately defeated by an 8-bit chess engine. A 1 MHz CPU should, at best, be able to think one or two moves ahead, while ChatGPT relies on an endless army of modern, power-hungry GPUs to keep its chat service running. And yet, the 1 MHz CPU won, thrashing the chatbot at beginner level.

Caruso’s experiment is a useful reminder about what LLM models actually are: a complex, heuristics-based black box search engine designed to constantly please the final user with some sort of captivating result. They don’t “know” anything, have no reasoning or deduction capabilities, and certainly they have no intelligence on their own. And they absolutely suck at chess.

I never owned an Atari 2600 back in the day, though I did spend some glorious afternoons with my mighty Intellivision console. Next time, I’ll try to humble ChatGPT by making it play a round of Battle Chess on an emulated replica of my first x86 machine: an 80286 running at a blazing 16 MHz.