The goal of achieving so-called artificial general intelligence – or the ability of an engineered system to display human-like general intelligence – is still far in the future. Nevertheless, experts in the field of AI have undoubtedly achieved significant milestones along the way, including the development of an AI capable of deep neural reasoning, tactile reasoning, and even AI. with basic social skills.
Now, in another step toward AI with more human-like intelligence, researchers from IBM, Massachusetts Institute of Technology, and Harvard University have developed a series of tests that would assess an AI’s ability to use a machine version of “common sense” – or a basic ability to perceive, understand and judge in a way that is shared by almost all humans.
For most people, common sense is not necessarily something that must be explicitly taught, but can be learned from early childhood through trial and error, in order to acquire a practical type of judgment that helps us navigate daily life. Think of babies who quickly learn the laws of physics by constantly manipulating or dropping things to see what happens. In contrast, common sense does not come naturally to machines, as they are constrained by the datasets they are trained with and must follow the rules of their underlying algorithms.
Still, even if AI isn’t quite capable of learning a bit of common sense on its own, researchers are still keenly interested in finding ways to measure a person’s basic psychological reasoning ability. AI.
As the research team explained, “For machine agents to successfully interact with humans in real-world contexts, they will need to develop an understanding of human mental life. Intuitive psychology, the ability to reason about hidden mental variables that lead to observable actions, comes naturally to people: even pre-verbal infants can distinguish agents from objects, expecting agents to act effectively to achieve given objectives. Despite recent interest in machine agents that reason about other agents, it is unclear whether these agents learn or hold the fundamental principles of psychology that drive human reasoning.
To better assess machine reasoning, the research team created a benchmark called Aaction-goal-Eco-efficiencyNOTstump-uJility, or AGENT for short. AGENT tests consist of a data set of 3D animations inspired by previous cognitive development experiments.
As the IBM researchers explained, the animations show a virtual agent interacting with different elements, under different physical limitations: “The videos include separate trials, each including one or more ‘familiarization’ videos of typical an agent in a certain physical environment, paired with “test” videos of the same agent’s behavior in a new environment, which are labeled as “expected” or “surprising”, given the agent’s behavior in the videos of corresponding familiarization.
Inspired by experiments studying cognitive development in children, the AGENT test is structured around the concepts underlying what is called intuitive psychology, which human infants learn before they learn to speak. These pre-verbal aspects of intuitive psychology include variables such as goal preferences, action efficacy, unobserved constraints, and cost-reward trade-offs.
Along with goal preferences, the subset of tests will determine whether an AI understands that virtual agents choose to pursue a particular goal or object based on its preferences, and pursuing the same goal under different physical conditions could lead to different actions. For action efficiency, another subset of tests will see if a model understands that a virtual agent can be physically constrained by its environment, and will tend to take the most efficient course of action to achieve its goal. .
The unobserved constraints test examines whether a model can infer a hidden obstacle based on the observation of an agent’s actions. Finally, the cost-reward trade-off subtest attempts to determine whether an AI understands what the agent prefers and whether it plans its actions based on utility, by observing the “cost level” it voluntarily spends. to reach this goal.
After being presented with these animations in the test, the AI model should then assess the surprise of the virtual agent’s actions in the “test” videos, versus the “familiarization” videos. Using the AGENT benchmark, this AI rating is then validated against ratings collected from humans who have watched the same set of videos.
Interestingly, the team deliberately kept the dataset relatively small to ensure the AI didn’t just randomly arrive at the correct answer. “Training from scratch on our dataset will not work. Instead, we suggest that to pass the tests, it is necessary to gain additional knowledge, either through inductive biases in the architectures or from training on additional data,” the researchers explain.
Although the test is still being improved, the team believes AGENT could be a useful diagnostic tool for evaluating and advancing common sense in AI systems. Additionally, the study demonstrates the potential for translating traditional developmental psychology methods to assess intelligent machines in the future. Measuring an AI’s reasoning abilities is important because we want to know how an AI will behave in situations that are unpredictable, ambiguous, and not strictly defined by rules.
In these undefined situations, some sort of self-supervised learning would help AI systems better predict what comes next, even if the available data is substandard or unlabeled. This would reduce training times and the need for human supervision, as well as the reliance of AI systems on massive datasets, which helps increase efficiency and reduce costs.
Read the team paper.
#common #sense #test #lead #smarter