Presentation Information

[MG1-1]Programmable Fabrics with Optical Switches in AI Supercomputers

○Nikos Terzenidis1, Giannis Patronas1, Dimitris Syrivelis1, Eitan Zahavi2, Athanasios Fevgas1, Nikos Argyris1, Prethvi Kashinkunti3, Louis Capps3, Zsolt-Alon Wertheimer2, Chen Avin2, Julie Bernauer3, Elad Mentovich2, Paraskevas Bakopoulos1 (1.NVIDIA Greece, 2.NVIDIA Israel, 3.NVIDIA USA)
We explore the integration of Optical Circuit Switches (OCSs) in AI/HPC clusters’ fabrics, to enhance resiliency in case of failures and enable dynamic topology reconfiguration for optimized deep-learning training, using a Layer-1 Software-Defined Network approach.