Ongoing Research

The following is a list of the current research efforts being conducted under the ALLVM umbrella.  The techniques we are exploring will be applicable in varying degrees to any software shipped as virtual instruction set code, but software written using static languages are likely to benefit the most.  In particular, several important and large categories of software are written in static languages and do not benefit from virtual instruction sets, including high performance computing (HPC) software, systems software, mobile applications for iOS, and embedded software.  Through our research, we hope to bring the benefits of compiling with virtual instruction sets to all these classes of software.  We also hope to develop new classes of techniques (e.g., IPC optimizations, dynamic autovectorization, novel autotuning approaches and automated fault diagnosis) that benefit all classes of software.

HPVM: Parallel program compilation and performance portability

The first goal of this work is to design suitable abstractions of parallelism to represent parallel computations in both virtual instruction sets and in compiler intermediate representations that manipulate them.  We have prototyped a design called Heterogeneous Parallel Virtual Machine (HPVM) as an extension of the LLVM IR.  A second goal is to develop code generation, optimization and autotuning techniques that take advantage of the HPVM abstractions to generate very high performance code for a wide range of parallel hardware, including emerging heterogeneous parallel systems.  A third goal is to implement existing and new parallel programming languages — both general-purpose languages like OpenMP and OpenCL and domain-specific languages — using these compiler capabilities.

Cross-package software optimization

Led by Sean Bartell

The scope of compiler optimizations has been increasing over the past few decades, and sophisticated interprocedural optimizations and link-time optimizations are now commonplace. However, compilers are still unable to optimize across the boundaries between different dynamic libraries, programs and dynamic libraries, and programs from different software packages. This project is enabling new optimizations to be performed across these boundaries, with a particular emphasis on code size optimizations.

Verified Translation

Led by Theodoros Kasampalis

One drawback of shipping code in virtual instruction set form is that the final machine code cannot be tested directly.  This is especially problematic for C/C++ code with undefined behaviors, which may lead to different — and unpredictable — results for different compilers and even different optimization sequences.  Our goal is to develop a certified code generation strategy based on Translation Validation from virtual instruction set to native machine code so that developers can test their code before shipping and have the same confidence in the behavior of the code that they would do with well-tested native machine code.  To achieve this goal, we must carry out translation validation between two different languages, which has never been done before.  Moreover, we must develop techniques to obtain predictable semantics for programs with undefined behaviors and ensure that back-end code generators respect such semantics.

Debloating and System Specialization

Specializing and co-optimizing an entire software stack (application, libraries, operating system) for a given configuration description.  There are several contexts in which such specialization can be especially fruitful:

  • High-performance computing (HPC) applications built on complex libraries for portability: We are exploring libraries including Kokkos, GASNet, libOMP, and OpenMPI.
  • Docker containers containing a fixed set of predefined software components, often with a single primary application and its libraries.  Such containers inherently define a very narrow use case for the libraries and operating system interfaces, providing the potential for aggressive specialization.
  • Software whose behavior is extensively customized via configuration files defined at install-time or startup time.  By treating the configuration data as constants, we hope to propagate them aggressively and achieve more efficient, specialized code.