The heterogeneous compute era is no longer coming — it’s here. As AI workloads flood into HPC environments, systems must simultaneously manage CPUs running legacy simulation code, GPUs handling deep learning, and increasingly, TPUs purpose-built for systolic array operations. At HPSFCon 2026, Phani Pendurthi from Mastercard brought an industry practitioner’s perspective to this challenge.
“CPU, GPU, and TPU Co-Scheduling: Architectural Tradeoffs for HPC Performance, Energy, and Cost” examined what it takes to run all three processor types together efficiently. The differences are profound: CPUs execute MIMD workloads, GPUs use SIMT parallelism, TPUs rely on systolic arrays. Each has distinct memory hierarchies — NUMA, HBM, and on-chip SRAM — and radically different programming abstractions.
The talk surveyed three scheduling paradigms: static, semi-static, and dynamic allocation. Pendurthi’s experience at Mastercard grounded each approach in real-world system behavior. Performance portability, energy efficiency, and cost aren’t independent variables — and the talk was explicit about those tradeoffs.
View the full playlist from HPSFCon 2026: https://www.youtube.com/playlist?list=PLRKq_yxxHw29oZTboj6fmdYhQMWHUaj4u.