1. Lexical Analysis

Lexical Analysis.svg

Lexical analysis is the first phase of compilation. Its job is to convert raw source code (text) into a stream of tokens that the parser can understand. This process is called tokenisation.

But why? why would we need tokens? If the compiler tried to parse source code directly, every later phase would have to:

Handle whitespace
Distinguish keywords from identifiers
Recognize numbers, strings, and operators
Deal with comments and formatting

That would make things unnecessarily complex. So that’s why the lexer exists to separate ****concerns and this separation is one of the most important design decisions in compiler construction.

What is a Token?

A token is a single meaningful unit in your source code. Think of it as breaking down a sentence into words.

Example:

var x = 5;

This gets broken down into the following tokens:

VAR (keyword)
IDENTIFIER (x)
EQUAL (=)
NUMBER (5)
SEMICOLON (;)

Each token has a type and a value.