Researchers increase the accuracy and efficiency of a machine learning method that protects user data. — ScienceDaily

To train a machine learning model to perform a task such as image classification effectively, the model must show thousands, millions, or even billions of sample images. Collecting such huge data sets can be especially challenging when privacy is an issue, such as with medical images. Researchers from MIT and MIT-born startup DynamoFL have now developed a popular solution to this problem, known as federated learning, and made it faster and more accurate.

Federated learning is a collaborative method of training a machine learning model that keeps sensitive user data private. Hundreds or thousands of users each train their own model with their own data on their own device. Then users transfer their models to a central server, which combines them to come up with a better model that sends it back to all users.

For example, a collection of hospitals around the world could use this method to train a machine learning model that identifies brain tumors in medical images, while keeping patient data safe on their local servers.

But federated learning has some drawbacks. Transferring a large machine learning model to and from a central server involves moving a lot of data, which incurs high communication costs, especially since the model has to be sent back and forth tens or even hundreds of times. In addition, each user collects their own data, so that data does not necessarily follow the same statistical patterns, which hinders the performance of the combined model. And that combined model is created by taking an average — it’s not personalized for each user.

The researchers developed a technique that can tackle these three problems of federated learning simultaneously. Their method increases the accuracy of the combined machine learning model while significantly reducing its size, speeding up communication between users and the central server. It also ensures that each user receives a model that is more personalized to their environment, which improves performance.

The researchers were able to reduce the model size by nearly an order of magnitude compared to other techniques, leading to communication costs that were between four and six times lower for individual users. Their technique was also able to increase the overall accuracy of the model by about 10 percent.

“Many papers have addressed one of the problems of federated learning, but the challenge has been to bring it all together. Algorithms that focus only on personalization or communication efficiency are not a good enough solution. We wanted to make sure that we were able to optimize for everything, so this technique could also be used in the real world,” said Vaikkunth Mugunthan PhD ’22, lead author of a paper introducing this technique.

Mugunthan co-authored the article with his advisor, senior author Lalana Kagal, a principal investigator in the Computer Science and Artificial Intelligence Laboratory (CSAIL). The work will be presented at the European Conference on Computer Vision.

Cut a model to size

The system the researchers developed, called FedLTN, is based on an idea in machine learning known as the lottery card hypothesis. This hypothesis says that within very large neural network models, many smaller subnetworks exist that can achieve the same performance. Finding one of these sub-networks is similar to finding a winning ticket. (LTN stands for “lottery ticket network”.)

Neural networks, loosely based on the human brain, are machine learning models that learn to solve problems using interconnected layers of nodes or neurons.

Finding a winning lottery ticket network is more complicated than a simple scratch card. The researchers must use a process called iterative pruning. If the model’s accuracy is above a set threshold, they remove nodes and the connections between them (much like pruning branches from a bush), then test the leaner neural network to see if the accuracy remains above the threshold.

Other methods have used this federated learning pruning technique to create smaller machine learning models that can be transferred more efficiently. But while these methods can speed things up, model performance suffers.

Mugunthan and Kagal applied a few new techniques to speed up the pruning process while making the new, smaller models more accurate and personalized for each user.

They accelerated the pruning by avoiding a step where the remaining parts of the pruned neural network are “flushed back” to their original values. They also trained the model before it was pruned, making it more accurate so it can be pruned faster, Mugunthan explains.

To make each model more personal to the user’s environment, they made sure not to strip away layers in the network that capture important statistical information about that user’s specific data. In addition, when the models were all combined, they took advantage of information stored on the central server so that it didn’t have to start over with each round of communication.

They also developed a technique to reduce the number of rounds of communication for users with limited resources, such as a smartphone on a slow network. These users start the federated learning process with a leaner model that has already been optimized by a subset of other users.

Win big with lottery ticket networks

When they put FedLTN to the test in simulations, it resulted in better performance and lower communication costs across the board. In one experiment, a traditional federated learning approach produced a model 45 megabytes in size, while their technique generated a model with the same accuracy that was only 5 megabytes. In another test, a state-of-the-art technique required 12,000 megabytes of communication between users and the server to train a single model, while FedLTN required only 4,500 megabytes.

With FedLTN, the worst-performing customers still saw a performance improvement of more than 10 percent. And the overall accuracy of the model beat the state-of-the-art personalization algorithm by nearly 10 percent, Mugunthan added.

Now that they have developed and refined FedLTN, Mugunthan is working to integrate the technique into a federated learning startup he recently founded, DynamoFL.

He hopes to continue improving this method in the future. For example, the researchers have shown success using labeled datasets, but a greater challenge would be applying the same techniques to unlabeled data, he says.

Mugunthan hopes this work will inspire other researchers to rethink how they approach federated learning.

“This work shows how important it is to think about these issues from a holistic point of view, not just individual metrics that need improvement. Sometimes improving one metric can cause the other metric to drop. Instead we should focus on how we can improve a lot of things together, which is really important when it’s deployed in the real world,” he says.