The synergetic effort of a KAUST-led team has enhanced the speed of Artificial Intelligence through a new technology they call “in-network aggregation”.
The team, comprised of researchers and system architects from Intel, Microsoft, and the University of Washington, achieved this feat by amalgamating a lightweight optimisation code with high-speed network devices, which increased the speed of Artificial Intelligence on paralysed computer systems five-fold.
Considerable advancements have been made in the field of Artificial Intelligence in recent years, which has been mainly due to the enhancements of the machine-learning step of AI. In this phase, the technology interacts with the world through large quantities of data. Here, the model is trained using substantial sets of labelled training data. The model is likely to perform better when confronted with novel inputs if rigorously tested against as much data as possible.
The machine-learning step is crucial to the evolution of Artificial Intelligence; however, due to it usually requiring a considerable number of computers running the learning algorithm in parallel, progress, until now, has not been optimal.
Marco Canini from the KAUST research team said: “How to train deep-learning models at a large scale is a very challenging problem. The AI models can consist of billions of parameters, and we can use hundreds of processors that need to work efficiently in parallel. In such systems, communication among processors during incremental model updates easily becomes a major performance bottleneck.”
To overcome this problem, the team incorporated the use of new network technology developed by Barefoot Networks, a division of Intel.
Amedeo Sapio, KAUST alumnus and member of the Barefoot Networks team, explained: “We use Barefoot Networks’ new programmable dataplane networking hardware to offload part of the work performed during distributed machine-learning training. Using this new programmable networking hardware, rather than just the network, to move data means that we can perform computations along the network paths.”
During the model update phase of the machine-learning process, the team’s SwitchMl platform allows the network hardware to perform data aggregation tasks, which mitigates the computational load, minimalising the amount of data transmission.
“Although the programmable switch dataplane can do operations very quickly, the operations it can do are limited. So, our solution had to be simple enough for the hardware and yet flexible enough to solve challenges such as limited onboard memory capacity. SwitchML addresses this challenge by co-designing the communication network and the distributed training algorithm, achieving an acceleration of up to 5.5 times compared to the state-of-the-art approach,” said Canini.