AI

Perfection is the enemy of AI

Researchers from the University of Michigan have introduced OptiReduce, an innovative communication system that enhances the speed and efficiency of AI training on cloud servers

Perfection is the enemy of AI
Agencias

Agencias

  • April 30, 2025
  • Updated: April 30, 2025 at 12:10 PM
Perfection is the enemy of AI

A research team from the University of Michigan has developed a new collective communication system called OptiReduce, which accelerates artificial intelligence (AI) training and machine learning across multiple cloud servers.

This innovative system sets time limits for communication between servers, eliminating the need to wait for everyone to complete their tasks, which translates into greater efficiency in processing large models.

Distributed deep learning requires multiple servers to work together, but congestion and delays are common in cloud computing centers due to the simultaneous load of jobs.

AI models thrive with the OptiReduce communication method

OptiReduce offers a solution by introducing time limits that allow the process to progress without waiting for the slower servers to catch up. This way, a 70% increase in speed to achieve accuracy is achieved compared to Gloo and 30% faster than NCCL in shared cloud environments.

Although this methodology involves the loss of certain data due to time constraints, OptiReduce uses advanced mathematical techniques to approximate the missing information, thereby minimizing the impact on the final accuracy of the model.

Researchers argue that by accepting “limited reliability,” machine learning jobs can run faster without compromising their accuracy.

In its tests, OptiReduce proved to be significantly more effective compared to existing models, allowing large AI models, such as Llama 4 and Gemini, to be more resilient to data loss.

The team is also exploring the possibility of moving towards hardware-level solutions to achieve communication speeds of hundreds of Gigabits per second, a step that could further revolutionize cloud processing capabilities.

Latest Articles

Loading next article