Matrix multiplication is a popular kernel in high-performance scientific computing, gaming, and now even machine learning workloads. Companies like NVIDIA now build GPU hardware that excels at the ...