IR.svg

After semantic analysis, we have a structure (AST) that is syntactically valid and semantically meaningful. However, the AST shouldn't be directly converted to assembly code. An AST has nested expressions, but assembly instructions cannot handle such nesting, and AST structures vary greatly between languages. The solution is to use an Intermediate Representation (IR).

In this phase, the AST is transformed into a simpler and more uniform representation of the program.

For simplicity, Neko uses Three-Address Code (TAC / 3AC) as its IR.

<aside> <img src="/icons/info-alternate_blue.svg" alt="/icons/info-alternate_blue.svg" width="40px" />

An IR can provide a common target for multiple source languages and a common starting point for assembly generation for multiple target architectures. A great example of this is the LLVM compiler framework. One can create a new programming language and just compile it to the LLVM IR, and LLVM will take care of optimisation and code generation for different CPU architectures.

</aside>

Three-Address Code (TAC / 3AC)

In 3AC, each instruction:

For example, 3AC for var x = 5; will be:

t0 = 5
x = t0

Let's look at some more examples to get a better understanding of 3AC:

// Examples

/*
Expression:
  x = a + b * c + d
*/
t0 = b * c
t1 = a + t0
t2 = t1 + d
x = t2

/*
if-else statement:
  if (a < b)
      x = 1;
  else
      x = 2;  
*/
if a < b goto L0
goto L1
L0:
x = 1
goto L2
L1:
x = 2
L2:

/*
  while (i < 10)
      i = i + 1;
*/
L0:
if i < 10 goto L1
goto L2
L1:
t0 = i + 1
i  = t0
goto L0
L2:

/*
function:
  function add(var a, var b) {
      return a + b;
  }
*/
add:
t0 = a + b
return t0

/*
function call:
  x = add(a, b);
*/
param a
param b
t0 = call add, 2
x  = t0