Another game long considered extremely difficult for artificial intelligence (AI) to master has fallen into the hands of machines. An AI called DeepNash, made by London-based company DeepMind, met human experts at Stratego, a board game that requires long-term strategic thinking in the face of imperfect information.
Google AI beats top human players in StarCraft II strategy game
The realization, described in Science December 11comes on the heels of a study reporting an AI capable of playing Diplomacy2in which players must negotiate while cooperating and competing against each other.
“The speed with which qualitatively different game features have been conquered – or mastered to new levels – by AI in recent years is quite remarkable,” says Michael Wellman of the University of Michigan at Ann Arbor, a computer scientist who studies strategic reasoning and game theory. “Stratego and Diplomacy are quite different from each other and also possess challenging characteristics that are particularly different from games for which analogous milestones have been achieved.”
Stratego has features that make it much more complicated than chess, Go, or poker, all mastered by AIs (the latter two games in 20153 and 20194). In Stratego, two players place 40 pieces each on a board, but cannot see what their opponent’s pieces are. The goal is to move pieces in turn to eliminate those of the opponent and seize a flag. Stratego’s game tree – the graph of all the possible ways the game could play out – has 10535 states, compared to 10 GB360. In terms of imperfect early game information, Stratego has 1066 possible private positions, which eclipses the 106 such starting situations in two-handed Texas Hold’em poker.
“The sheer complexity of the number of possible outcomes in Stratego means that algorithms that work well on perfect-information games, and even those that work for poker, don’t work,” says Paris-based DeepMind researcher Julien Perolat.
Self-learning AI is even better at strategy game Go
So Perolat and his colleagues developed DeepNash. The name AI is a nod to American mathematician John Nash, whose work led to the term Nash equilibrium, a stable set of strategies that can be followed by all players in a game, from so that no player benefits from a change of strategy on their own. Games can have zero, one or more Nash equilibria.
DeepNash combines a reinforcement learning algorithm with a deep neural network to find a Nash equilibrium. Reinforcement learning involves finding the best policy to dictate action for each state of a game. To learn an optimal policy, DeepNash played 5.5 billion games against itself. If one side gets a reward, the other is penalized and the parameters of the neural network – which represent the policy – are changed accordingly. Eventually, DeepNash converges to an approximate Nash equilibrium. Unlike previous gaming AIs such as AlphaGo, DeepNash does not search the game tree to optimize itself.
For two weeks in April, DeepNash competed against human Stratego players on the online gaming platform Gravon. After 50 games, DeepNash was ranked third among all Gravon Stratego players since 2002. “Our work shows that a game as complex as Stratego, involving imperfect information, does not require research techniques to solve it,” says Karl Tuyls, team member, a DeepMind. researcher based in Paris. “This is a very big step forward in AI.”
“The results are impressive,” says New York-based Meta AI researcher Noam Brown, who led the team that brought back the 2019 AI Pluribus poker game.4.
Brown and his colleagues at Meta AI took on a different challenge: to create an AI capable of playing Diplomacy, a game for up to seven players, each representing a major power in pre-WWI Europe. . The goal is to take control of supply centers by moving units (fleets and armies). Above all, the game requires private communication and active cooperation between players, unlike two-player games such as Go or Stratego.
“When you go beyond two-player zero-sum games, the idea of Nash equilibrium isn’t as useful for playing well with humans,” Brown says.
No limits: the AI poker bot is the first to beat the professionals in a multiplayer game
So the team trained their AI – named Cicero – on data from 125,261 games of an online version of Diplomacy involving human players. By combining these with some self-game data, Cicero’s Strategic Reasoning Module (SRM) learned to predict, for a given game state and accumulated messages, the likely policies of other players. Using this prediction, the SRM chooses an optimal action and signals its “intention” to Cicero’s dialog module.
The dialogue module was built on a 2.7 billion parameter language model pre-trained on text from the Internet and then refined using messages from diplomatic games played by people. Given the intent of the SRM, the module generates a conversational message (for example, Cicero, representing England, could ask France: “Do you want to support my convoy to Belgium?”).
In a November 22 Science paper2the team reported that in 40 online games, “Cicero scored more than twice the average score of human players and ranked in the top 10% of participants who played more than one game”.
Behavior in the real world
Brown thinks gaming AIs that can interact with humans and report on suboptimal or even irrational human actions could pave the way for real-world applications. “If you’re building a self-driving car, you don’t want to assume that every other driver on the road is perfectly rational and is going to behave optimally,” he says. Cicero, he adds, is a big step in this direction. “We still have one foot in the game world, but now we also have one foot in the real world.”
Wellman agrees, but says more work is needed. “Many of these techniques are indeed relevant beyond recreational games” to real-world applications, he says. “Nevertheless, at some point, leading AI research labs have to go beyond recreational settings and figure out how to measure scientific progress on the squishiest real-world ‘games’ we really care about.”
#DeepMind #overthrows #Stratego #complex #game #experts