Imagine two teams competing on a soccer field. Players can cooperate to achieve a goal and compete with other players with conflicting interests. That’s how the game works.
Creating artificial intelligence agents that can learn to compete and cooperate as effectively as humans remains a thorny issue. A key challenge is to enable AI agents to anticipate future behaviors of other agents when they all learn simultaneously.
Due to the complexity of this problem, current approaches tend to be myopic; agents can only guess the next moves of their teammates or competitors, leading to poor long-term performance.
Researchers at MIT, the MIT-IBM Watson AI Lab and elsewhere have developed a new approach that gives AI agents a foresighted perspective. Their machine learning framework allows cooperative or competitive AI agents to consider what other agents will do as time approaches infinity, not just on a few next steps. Agents then adapt their behaviors accordingly to influence the future behaviors of other agents and arrive at an optimal long-term solution.
This framework could be used by a group of autonomous drones working together to find a hiker lost in a thick forest, or by autonomous cars that strive to ensure passenger safety by anticipating the future movements of other vehicles traveling on a busy highway.
“When AI agents cooperate or compete, what matters most is when their behaviors converge at some point in the future. There are a lot of transient behaviors along the way that have not a lot of long-term importance. Achieving this converged behavior is what we really care about, and now we have a mathematical way to enable it,” says Dong-Ki Kim, a graduate student in the Information and Communication Systems Laboratory. Decision (LIDS) from MIT and lead author of a paper describing this framework.
The lead author is Jonathan P. How, Richard C. Maclaurin Professor of Aeronautics and Astronautics and a member of the MIT-IBM Watson AI Lab. Co-authors include others from the MIT-IBM Watson AI Lab, IBM Research, the Mila-Quebec Artificial Intelligence Institute, and the University of Oxford. The research will be presented at the Neural Information Processing Systems conference.

Play video
In this demonstration video, the red robot, which was trained using the researchers’ machine learning system, is able to defeat the green robot by learning more efficient behaviors that take advantage of the ever-changing strategy of his opponent.
More agents, more problems
The researchers focused on a problem known as multi-agent reinforcement learning. Reinforcement learning is a form of machine learning in which an artificial intelligence agent learns through trial and error. Researchers give the agent a reward for “good” behaviors that help them achieve a goal. The agent adapts its behavior to maximize this reward until it eventually becomes an expert in a task.
But when many cooperative or competing agents learn simultaneously, things get more and more complex. As agents consider more future steps of their fellow agents and how their own behavior influences others, the problem soon requires far too much computing power to solve effectively. This is why other approaches only focus on the short term.
“AIs really want to think about the end of the game, but they don’t know when the game will end. They must think about how to keep adapting their behavior endlessly so that they can win at some point far in the future. Our paper basically proposes a new lens that allows an AI to think to infinity,” says Kim.
But since it’s impossible to plug infinity into an algorithm, the researchers designed their system to have agents focus on a future point where their behavior will converge with that of other agents, called equilibrium. An equilibrium point determines the long-term performance of agents, and multiple equilibriums may exist in a multi-agent scenario. Therefore, an effective agent actively influences the future behaviors of other agents in such a way that they achieve a desirable equilibrium from the agent’s point of view. If all the agents influence each other, they converge towards a general concept that researchers call an “active equilibrium”.
The machine learning framework they developed, known as FURTHER (which stands for FUlly Reinforcing acTive influence with averagE Reward), allows agents to learn to adapt their behaviors when interacting with other agents to achieve this active balance.
FURTHER does this using two machine learning modules. The first, an inference module, allows an agent to guess the future behaviors of other agents and the learning algorithms they use, based solely on their past actions.
This information is fed into the reinforcement learning module, which the agent uses to adapt its behavior and influence other agents in such a way as to maximize its reward.
“The challenge was to think about infinity. We had to use a lot of different mathematical tools to enable this and make assumptions to make it work in practice,” says Kim.
Winning in the long run
They tested their approach against other multi-agent reinforcement learning frameworks in several different scenarios, including a pair of robots battling sumo-style and a battle between two teams of 25 agents. In both cases, AI agents using FURTHER won the games more often.
Because their approach is decentralized, meaning agents learn to win games independently, it’s also more scalable than other methods that require a central computer to control agents, Kim says.
The researchers used games to test their approach, but FURTHER could be used to solve any type of multi-agent problem. For example, it could be applied by economists seeking to build sound policy in situations where many interacting rights have behaviors and interests that change over time.
Economics is an application that Kim is particularly excited to study. He also wants to deepen the concept of active balance and continue to improve the FURTHER framework.
This research is funded, in part, by the MIT-IBM Watson AI Lab.
#visionary #approach #machine #learning