Lexical analysis is the first phase of compilation. Its job is to convert raw source code (text) into a stream of tokens that the parser can understand. This process is called tokenisation.
But why? why would we need tokens? If the compiler tried to parse source code directly, every later phase would have to:
That would make things unnecessarily complex. So that’s why the lexer exists to separate ****concerns and this separation is one of the most important design decisions in compiler construction.
A token is a single meaningful unit in your source code. Think of it as breaking down a sentence into words.
Example:
var x = 5;
This gets broken down into the following tokens:
VAR (keyword)IDENTIFIER (x)EQUAL (=)NUMBER (5)SEMICOLON (;)Each token has a type and a value.