Learning LLVM

  • Build LLVM (inside build directory)
CC=/software/gcc-4.8.2/bin/gcc CXX=/software/gcc-4.8.2/bin/g++ cmake -G "Unix Makefiles" /path/to/llvm/src
make
make -j4
  • compile the C file into a native executable
% clang hello.c -o hello
  • The -emit-llvm option can be used with the -S or -c options to emit an LLVM .ll or .bc file (respectively) for the code
% clang -O3 -emit-llvm hello.c -c -o hello.bc

Code Generation Options

-O0, -O1, -O2, -O3, -Ofast, -Os, -Oz, -O, -O4
Specify which optimization level to use:

-O0 Means “no optimization”: this level compiles the fastest and generates the most debuggable code.

  • The LLVM Project also provides tools to convert the on-disk format from text to binary: llvm-as assembles the textual .ll file into a .bc file containing the bitcode goop and llvm-dis turns a .bc file into a .ll file.
llvm-dis < hello.bc | less

 

The -load option specifies that opt should load your pass as a shared object, which makes “-hello” a valid command line argument (which is one reason you need to register your pass). Because the Hello pass does not modify the program in any interesting way, we just throw away the result of opt (sending it to /dev/null).

[weiyang3@seclab-central helloExample]$ opt -load /home/weiyang3/schoolWorkspace/tool/LLVM/mybuilddir/lib/LLVMHello.so -hello < hello.bc > /dev/null 
Hello: main

To see what happened to the other string you registered, try running opt with the -help option:

[weiyang3@seclab-central helloExample]$opt -load /home/weiyang3/schoolWorkspace/tool/LLVM/mybuilddir/lib/LLVMHello.so -help | less
...
 -hello - Hello World Pass
 -hello2 - Hello World Pass (with getAnalysisUsage implemented)
...

Once you get it all working and tested, it may become useful to find out how fast your pass is. The PassManager provides a nice command line option (–time-passes) that allows you to get information about the execution time of your pass along with the other passes you queue up. For example:

[weiyang3@seclab-central helloExample]$ opt -load /home/weiyang3/schoolWorkspace/tool/LLVM/mybuilddir/lib/LLVMHello.so -hello -time-passes < hello.bc > /dev/null

-internalize: Marks functions and global variables not explicitly marked external as internal, enabling more optimizations to be applied to them. This is safe if you have a complete module for a program, except perhaps for some external libraries that will be linked in later.

[weiyang3@seclab-central helloExample]$ opt -internalize  < struct.bc > internalize.bc
[weiyang3@seclab-central structExample]$ llvm-dis < internalize.bc > internalize.ll
[weiyang3@seclab-central structExample]$ diff 1.ll internalize.ll
15c15
< define void @read_line(i8* nocapture %Str) #0 {
---
> define internal void @read_line(i8* nocapture %Str) #0 {
59c59
< define void @print_employee(%struct.person* byval nocapture readonly align 8 %Emp) #0 {
---
> define internal void @print_employee(%struct.person* byval nocapture readonly align 8 %Emp) #0 {
90c90
< define i32 @main() #0 {
---
> define internal i32 @main() #0 {

 

LLVM does require all register values to be in SSA form, it does not require (or permit) memory objects to be in SSA form. In the example above, note that the loads from G and H are direct accesses to G and H: they are not renamed or versioned. This differs from some other compiler systems, which do try to version memory objects. In LLVM, instead of encoding dataflow analysis of memory into the LLVM IR, it is handled withAnalysis Passes which are computed on demand.

Notice how the type of the @G/@H global variables is actually “i32*” even though the variable is defined as “i32”. What this means is that @G defines space for an i32 in the global data area, but its name actually refers to the address for that space. Stack variables work the same way, except that instead of being declared with global variable definitions, they are declared with the LLVM alloca instruction

LLVM IR

LLVM identifiers come in two basic types: global and local. Global identifiers (functions, global variables) begin with the '@' character. Local identifiers (register names, types) begin with the '%' character.

 

The ‘alloca‘ instruction allocates memory on the stack frame of the currently executing function, to be automatically released when this function returns to its caller. The ‘alloca‘ instruction allocates sizeof(<type>)*NumElements bytes of memory on the runtime stack, returning a pointer of the appropriate type to the program. If a constant alignment is specified, the value result of the allocation is guaranteed to be aligned to at least that boundary.

 define i32 @main() #0 {
 entry:
 %This_Employee25 = alloca %struct.person, align 8
 %This_Employee = alloca %struct.person, align 8
...

 

Load

load type, pointer

<result> = load [volatile] <ty>, <ty>* <pointer>

Store

store value, pointer

store [volatile] <ty> <value>, <ty>* <pointer>

 

Terms:

A virtual register is an abstract location that can hold a single scalar value. Virtual registers are introduced by a compiler when it generates intermediate code. They represent placeholders for physical registers or memory locations, by which they are replaced at code generation time

 

Environment Setup:

Methods:

iterator_range<user_iterator> llvm::Value::users ( )

iterator_range: A range adaptor for a pair of iterators. This just wraps two iterators into a range-compatible interface.

user_iterator: typedef user_iterator_impl<User> user_iterator;

User: This class defines the interface that one who uses a Value must implement. Each instance of the Value class keeps track of what User’s have handles to it.

Value: This is a very important LLVM class. It is the base class of all values
computed by a program that may be used as operands to other values. Value is
the super class of other important classes such as Instruction and Function.
All Values have a Type. Type is not a subclass of Value. Some values can
have a name and they belong to some Module. Setting the name on the Value
automatically updates the module’s symbol table.

Every value has a “use list” that keeps track of which other Values are
using this Value. A Value can also have an arbitrary number of ValueHandle
objects that watch it and listen to RAUW and Destroy events. See
llvm/IR/ValueHandle.h for details.

DbgDeclareInst – This represents the llvm.dbg.declare instruction.

llvm.dbg.declare void @llvm.dbg.declare(metadata, metadata, metadata)

This intrinsic provides information about a local element (e.g., variable). The first argument is metadata holding the alloca for the variable. The second argument is a local variable containing a description of the variable. The third argument is a complex expression.

DbgValueInst – This represents the llvm.dbg.value instruction

llvm.dbg.value void @llvm.dbg.value(metadata, i64, metadata, metadata)

This intrinsic provides information when a user source variable is set to a new value. The first argument is the new value (wrapped as metadata). The second argument is the offset in the user source variable where the new value is written. The third argument is a local variable containing a description of the variable. The fourth argument is acomplex expression.

BitCastInst This class represents a no-op cast from one type to another.

GetElementPtrInst – an instruction for type-safe pointer arithmetic to access elements of arrays and structs.