AI with Sally Ward-Foxton

Extending The Life Of Copper In AI Training Cluster
AI training courses are scaling to hundreds of thousands of GPUs and beyond, and the network is under pressure like never before. In this episode, Sally will speak with Don Barnetson, SVP of Product at Credo Semiconductor, about how reliability is crucial, given that crashing a training run can cost millions of dollars. Credo has spent nearly two decades developing some of the industry’s fastest SERDES and building active electrical cables or AECs to keep AI infrastructure running at peak performance. In this conversation, we’ll dig into the networking reliability issues that come with scaling AI, why reliability matters as much as bandwidth, and how Credo is taking on this problem with copper.