This AI method from MIT and IBM Research improves training and inference performance of deep learning models on large graphs

This AI method from MIT and IBM Research improves training and inference performance of deep learning models on large graphs

Charts, an abstract and nonlinear type of data, are frequently used to illustrate the connections between different types of data, such as social media connections, hierarchies, financial transactions, etc. efficient calculation algorithms because the amount of data represented in the form of graphs is quite large. A neural network that directly manipulates graph structures is known as Graph Neural Network (GNN). GNNs have become increasingly common in recent years, especially in areas involving social networks, recommendation systems, etc.

Unlike ordinary neural networks, the exponential expansion of multi-hop graph neighborhoods along network layers makes building mini-batches in GNNs very computationally expensive. This makes improving the training and inference performance of GNNs quite difficult. To address these issues, MIT researchers collaborated with IBM Research to develop a new technique called SALIENT (Sampling, sLIcing, and data movemeNT). By addressing three main bottlenecks, their method dramatically shortens the running time of GNNs on huge datasets, even at the scale of billions. Additionally, the newly created approach scales well when computing capacity is increased from one to sixteen graphics processing units (GPUs).

When researchers began to examine the difficulties faced by current systems when scaling state-of-the-art machine learning techniques for graphs to large datasets, practically at the scale of billions, the need of SALIENT has become even more evident. Most current research achieves satisfactory performance on smaller datasets that can easily fit into GPU memory. The team’s goal is to build a system capable of handling graphs that can be used to represent the entire Bitcoin network. However, they also want the system to be as efficient and smooth as possible to keep up with the rate at which new data is being generated virtually every day.

In order to build SALIENT, the team initially included fundamental optimization techniques for elements that fit into already existing machine learning frameworks, like PyTorch Geometric and the Deep Graph Library (DGL). To speed up model training and get inference results faster, the main goal of inventing a method that could easily fit into current GNN architectures was to simplify the application of this work by users. domain experts to their specialist areas. One change the team made to their design was to use all hardware technologies at all times, including CPUs, datalines, and GPUs. For example, the GPU can be used to train the machine learning model or perform inference while the CPU samples the graph and creates mini-batches of data.

These simple tweaks allowed researchers to increase their GPU utilization by 10-30%, resulting in a 1.4-2x performance increase over open-source benchmark routines. However, the study team believed they could achieve even better results. So she set out to examine the bottlenecks that arise early in the data pipeline and the algorithms for graph sampling and mini-batch preparation.

GNNs differ significantly from other neural networks. They perform a neighborhood aggregation process involving calculating details about a certain node using its neighboring nodes. However, as the number of layers in a GNN increases, so does the number of nodes the network must connect to, which can sometimes push the limits of a computer. Although some neighborhood sampling techniques use randomization to slightly increase efficiency, this is insufficient. To solve this problem, the team improved the sampling procedure about three times using a combination of data structures and algorithmic improvements.

The team’s third and final bottleneck was to incorporate a prefetch stage to funnel data transfer in mini-batches between the CPU and GPU. The team also found and fixed a performance issue in a well-known PyTorch module, causing SALIENT to run for 16.5 seconds per epoch. The team believe their meticulous attention to detail is the reason they were able to produce such impressive results. By simply looking carefully at the variables that impact performance when training a GNN, they solved a significant number of performance issues. Their approach currently has only one bottleneck related to the GPU compute limit, which should be the case for an ideal system.

Researchers will now be able to manipulate graphs of a size never seen before, thanks to MIT and IBM’s SALIENT. Looking to the future, the team wants to use the graphical neural network training system on the current algorithms in place to predict the properties of each node and on more difficult tasks like recognizing patterns of substructures. deeper graphs. Financial crime reporting would be one of its practical applications. The US Air Force Research Laboratory, the US Air Force Artificial Intelligence Accelerator, and the MIT-IBM Watson AI Lab funded the team’s research. Their work was also presented at the MLSys 2022 conference.


Check Paper and MIT paper. All credit for this research goes to the researchers on this project. Also don’t forget to register. our Reddit page and discord channelwhere we share the latest AI research news, cool AI projects, and more.


Khushboo Gupta is an intern consultant at MarktechPost. She is currently pursuing her B.Tech from Indian Institute of Technology (IIT), Goa. She is passionate about the fields of machine learning, natural language processing and web development. She likes to learn more about the technical field by participating in several challenges.


#method #MIT #IBM #Research #improves #training #inference #performance #deep #learning #models #large #graphs

Leave a Comment

Your email address will not be published. Required fields are marked *