Keynote and Invited Speakers


  • Submission: August 20th, 2022
  • Notification: September 6th, 2022
  • Final Pre-Workshop papers: October 1st, 2022



The Evolution of Parallel Computing since LCPC’88

David Padua
Abstract: An overview of the evolution of hardware, programming notations, and compilers for parallel computing during the last 35 years and the impact of the 1988 state of the art on parallel computing today. 
Retire Linear Algebra Libraries

Albert Cohen
Abstract: Despite decades of investment in software infrastructure, scientific computing, signal processing and machine learning and systems remain stuck in a rut. Some numerical computations are more equal than others: BLAS and the core operations for neural networks achieve near-peak performance, while marginally different variants do not get this chance. As a result, performance is only achieved at the expense of dramatic loss of programmability. Compilers are obviously the cure. But what compilers? How should these be built, deployed, retargeted, autotuned? Sure, the BLAS API is not the ultimate interface to compose and reuse high-performance operations, but then, what would be a better one? And why did we not build and agree on one yet? We’ll review these questions and some of the proposed solutions in this talk. In particular, we will advocate for a new tile-level programming interface sitting in-between the top-level computational operations and generators of target- and problem-specific code. We will also advocate for a structured approach to the construction of domain-specific code generators for tensor compilers, with the stated goal of improving the productivity of both compiler engineers and end-users.
Bio: Albert is a research scientist at Google. An alumnus of École Normale Supérieure de Lyon and the University of Versailles, he has been a research scientist at Inria from 2000 to 2018, a visiting scholar at the University of Illinois, an invited professor at Philips Research, and a visiting scientist at Facebook Artificial Intelligence Research. Albert Cohen works on parallelizing and optimizing compilers, parallel and synchronous programming languages, machine learning compilers, with applications to high-performance computing, artificial intelligence and reactive control.
Compiler 2.0

Saman Amarasinghe
Abstract: When I was a graduate student a long time ago, I used to have intense conversations and learned a lot from my peers in other areas of computer science as the program structure, systems, and algorithms used in my compiler were very similar to and inspired by many of the work done by my peers. For example, a Natural Language Recognition System that was developed by my peers, with a single sequential program with multiple passes connected through IRs that systematically transformed an audio stream into text, was structurally similar to the SUIF compiler I was developing. In the intervening 30 years, the information revolution brought us unprecedented advances in algorithms (e.g., machine learning and solvers), systems (e.g., multicores and cloud computing), and program structure (e.g., serverless and low-code frameworks). Thus, a modern NLP system such as Apple’s Siri or Amazon’s Alexa, a thin client on an edge device interfacing to a massively-parallel, cloud-based, centrally-trained Deep Neural Network, has little resemblance to its predecessors. However, the SUIF compiler is still eerily similar to a state-of-the-art modern compiler such as LLVM or MLIR.  What happened with compiler construction technology?  At worst, as a community, we have been Luddites to the information revolution even though our technology has been critical to it. At best,  we have been unable to transfer our research innovations (e.g., polyhedral method or program synthesis) into production compilers. In this talk I hope to inspire the compiler community to radically rethink how to build next generation compilers by giving a few possible examples of using 21st century program structures, algorithms and systems in constructing a compiler.
Bio: Saman Amarasinghe is a Professor in the Department of Electrical Engineering and Computer Science at Massachusetts Institute of Technology and a member of its Computer Science and Artificial Intelligence Laboratory (CSAIL) where he leads the Commit compiler group. Under Saman’s guidance, the Commit group developed the StreamIt, PetaBricks, Halide, Simit, MILK, Cimple, TACO, GraphIt, BioStream, CoLa and Seq programming languages and compilers, DynamoRIO, Helium, Tiramisu, Codon, StreamJIT and BuildIt compiler/runtime frameworks, Superword Level Parallelism (SLP), goSLP, VeGen and SuperVectorizer for vectorization, Ithemal machine learning based performance predictor, Program Shepherding to protect programs against external attacks, the OpenTuner extendable autotuner, and the Kendo deterministic execution system. He was the co-leader of the Raw architecture project. Saman was a co-founder of Determina, Lanka Internet Services, Venti Technologies, and DataCebo Corporations.  Saman received his BS in Electrical Engineering and Computer Science from Cornell University in 1988, and his MSEE and Ph.D. from Stanford University in 1990 and 1997, respectively.  He is an ACM Fellow.
Portable Compilation of Sparse Computation

Fredrik Kjolstad
Abstract: Hardware is becoming ever more complicated and the architects are developing a fleet of new types of accelerators. I will talk about compiling collection-oriented programs to heterogeneous hardware. I will discuss properties that make certain programming abstractions amenable to portable compilation, give some examples, and describe a programming system design. I will then describe how to compile one such programming model, operations on sparse and dense arrays/tensors, to the major types of hardware: CPUs, fixed-function accelerators, GPUs, distributed machines, and streaming dataflow accelerators. Finally, I will briefly discuss how verification may make it easier to program heterogeneous machines. 
Bio: Fredrik Kjolstad is an Assistant Professor in Computer Science at Stanford University. He works on topics in compilers, programming models, and systems, with an emphasis on fast compilation and compilers for sparse computing problems where we need to separate the algorithms from data representation. He has received the NSF CAREER Award, the MIT EECS First Place George M. Sprowls PhD Thesis Award in Computer Science, the Rosing Award, an Adobe Fellowship, a Google Research Scholarship, and three best/distinguished paper awards.
Towards compiler-driven algorithm-architecture co-design for energy-efficient ML accelerators

Ponnuswamy Sadayappan
Abstract: The improvement of energy-efficiency of ML accelerators is of fundamental importance. The energy expended in accessing data from DRAM/SRAM/Registers is orders of magnitude higher than that expended in actually performing the arithmetic operations on data. The total energy expended in executing an ML operator depends both on the choice of accelerator design parameters (such as the capacities of register banks and scratchpad buffers) as well as the “dataflow” – the schedule of data movement and operation execution. The design space of architectural parameters and dataflow is extremely large. This talk will discuss how analytical modeling can be used to co-design accelerator parameters and dataflow to optimize energy.

Invited Speakers

GPU Collectives with MSCCL: Man vs. Dragons

Saeed Maleki
Abstract: Collective communication primitives on GPUs are the primary bottleneck on large neural network models. Although there have been decades of research on optimizing computation kernels, there has been very little done for collective communication kernels on GPUs. There are many challenges in area including unique GPU interconnection topologies, high P2P transfer latency, wide range of use cases for neural networks, and software complexities. In this talk, I will present program synthesis as a primary solution for communication algorithms for these topologies and show how a bespoke algorithm can significantly improve the overall performance of a model. Lastly, I will present a high-level DSL along with a compiler for mapping from an abstract synthesized algorithm to a low-level CUDA code for collective communications.
How Can Compilers Help The Additive Manufacturing of Electronics?

Xiaoming Li