5. Code Generation

Code Generation.svg

This phase it pretty straightforward. It converts the previously generated IR into target assembly.

In this project, we target x86_64 assembly using NASM syntax. This choice is pragmatic. x86_64 is widely used, well documented, and supported by standard Linux toolchains.

</aside>

From IR to Assembly

Code Generator.svg

(Of course, the output assembly doesn’t just include four instructions like in above diagram. This is just for visualisation purpose.)

The output assembly is divided into sections. These sections describe how memory is laid out in the final executable.

The .data section stores initialised global data, such as format strings used for printing.

The .bss section stores uninitialised global variables.

The .text section contains executable code. This is where the main label and all generated instructions live.

Refer: x86_64 Reference Sheet

</aside>

// ...

case ir::OpCode::ASSIGN:
{
		// emit function pushes a instruction in the output vector
    emit("mov rax, " + map_operand(*inst.arg1));
    emit("mov " + map_operand(*inst.result) + ", rax");
    break;
}

// ...

std::string CodeGenerator::map_operand(const ir::Operand& op)
{
    if (op.type == ir::OperandType::VARIABLE ||
        op.type == ir::OperandType::TEMPORARY)
    {
		    // brackets is to access the memory location of a variable
        return "[" + op.name + "]";
    }

    if (op.type == ir::OperandType::CONSTANT)
    {
		    // without brackets, the assembler treats it like as a constant
        return op.value;
    }

    return op.name;
}