
[VIDEO]: Hyve & Intel: A discussion on the versatility of 3rd Gen Intel Xeon Scalable processors and the Catalina platform for Deep Learning and HPC applications.
Q: The 3rd Gen Intel® Xeon® Scalable processor, which we just announced is another forward step in server computing with new features such as bfloat 16 as well as enhanced capability for 4S and 8S systems. Jay, can you share what was the highest thinking behind the Catalina concept?
A: It’s actually based on observations of what the development that has been happening the last 3-4 years. Intel has taken quite a few hardware and software steps, one after the other to improve the performance of Xeon® for deep learning. First there was the MKL library, then in Skylake there was AVX-512, then there was VNNI. I’m probably missing a few here and there but there have been a steady series of like 2X, 4X improvements. When we heard about bfloat16 which is of course another 2X improvement in training, in particular, also inference, and when we realized that 8S systems versus a 2S systems gives you a 4X advantage in a single Xeon® system it became clear that all of these multipliers add up to actually two orders of magnitude and so that would result in new levels of performance. I think Intel’s umbrella term for all of these is DLBoost and that’s like an Energizer Bunny that just keeps going. I don’t think we’re seeing the end of it yet and so that was actually the starting point for us to think about a system like Catalina.
Q: That was a great summary of deep learning boost. How does that line with Deep Learning use cases? Can you share a bit more?
A: The more interesting story is on the use case side actually, not in the hardware capability and the software capability itself. If you missed Misha Smelyanskiy’s talk from OCP summit last year is definitely worth watching. Misha is from Facebook and what he sort of narrated was that DL applications are a spectrum and at one end of the spectrum there is the very math matrix, math intensive use cases and then on the other end of the spectrum there is logic heavy use cases. Something like image recognition falls on the math heavy side. Initially, when we started deep learning, we were all fascinated with Resnet benchmarking and that was the only benchmarking people understood. But it’s only one use case. When you go and explore the other use cases and when you see that complex logic is involved and reinforcement learning is probably a really good example of that type of use case. A good CPU actually turns out to do quite well at deep learning and Intel® Xeon® Gen III is definitely an example of a really good CPU for this.
Q: Yes, I did hear about the MiniGo benchmark on a single Catalina system coming in at under 45 minutes. Did the Catalina team do any other benchmarking that reflects the AI benefits of the new Intel® Xeon® Generation?
A: Yes, we did. I just dissed Resnet benchmarking a few minutes ago but that’s exactly what we did with Catalina also. Then on a single Catalina system, 1 system, we got it to train on a Resnet model using Tensorflow on the standard image net data set at over 750 images per second per system, 1 system. So when you put a rack of these type of systems together a single rack goes at about 10K plus images per second. Basically, even for image recognition you don’t need hundreds of Xeon® systems you just need one rack to go at more than 10,000 images per second. That’s the other benchmarking that we did with Catalina.
Q: Hey, that’s a remarkable result so can you use Xeon systems like Catalina for Deep Learning in addition to high performance computing (HPC) analytics and general compute?
A: Yes, you can. In fact, we call it Fungible computing and this type of flexibility was the original hypothesis behind the design of Catalina. Along with Xeon Gen III we also put in a heavy dose of embedded 100 GbE Ethernet in Catalina with the view of optimizing the end to end data flow across the whole system.
Q: So basically from a customer application perspective, you can use a Catalina CPU-only system for a lot of your computing needs, is that correct?
A: Yeah that’s exactly the way to look at it. One system to do it all or at least most of it and that’s the way to look at the versatility of the Xeon Scalable Gen 3 and of the Catalina platform.