Skip to content

Lecture No. 09

Dated: 17-03-2025

DFA1 Minimization

The generated DFA1 can have large number of states and can be reduced using Hopcroft's algorithm which is to merge the equivalent states into groups.

cs606_e_9_1.svg

cs606_e_9_2.svg

Optimized acceptor.2

Lexical Analyzers

Lexical Analyzers3 (scanners3) use the same mechanism but have character stream as input and multiple regular expressions4 for multiple tokens5 and returns a sequence of matching tokens.5
It always returns the longest one.

Lexical Analyzer Generators

The process of constructing a lexical analyzer can be automated. We only need to specify Regular expressions for tokens and rules for assigning priorities for multiple longest match cases, e.g, == and =, == is longer.

There are two famous ones.

  1. Flex: For C and C++. It is the modern version of Lex, a tool in AT&T bell labs version of UNIX.
  2. Jlex: For Java.

Using Flex

Flex takes an input file which consists of 3 sections.

sections: C or C++ and flex definitions
%%
token definitions and actions
%%
user code

The <!-- separates the sections.

Example

Contents of lex.l file

%{
#include "tokdefs.h"
#include <iostream>
%}

D [0-9]
L [a-zA-Z_]
id {L}({L}|{D})*

-->

"void" {return(TOK_VOID);}
"int" {return(TOK_INT);}
"if" {return(TOK_IF);}
"else" {return(TOK_ELSE);}
"while" {return(TOK_WHILE);}
"<=" {return(TOK_LE);}
">=" {return(TOK_GE);}
"==" {return(TOK_EQ);}
"!=" {return(TOK_NE);}
{D}+ {return(TOK_DECIMAL);}
{id} {return(TOK_ID);}

%%

int main() {
    int tc = yylex();

    while(tc != 0) {
        std::cout << tc << "," << yytext << std::endl;
        tc = yylex();
    }

    return 0;
}

int yywrap () {
    return 1;
}

Contents of tokdefs.h are

#define TOK_VOID 1
#define TOK_INT 2 
#define TOK_IF 3 
#define TOK_ELSE 4 
#define TOK_WHILE 5 
#define TOK_LE 6 
#define TOK_GE 7 
#define TOK_EQ 8 
#define TOK_NE 9 
#define TOK_DECIMAL 10 
#define TOK_ID 111
flex lex.l
g++ lex.yy.c -o scanner.exe
./scanner.exe

References


  1. Read more about finite automaton

  2. Read more about acceptor

  3. Read more about scanners

  4. Read more about regular expressions

  5. Read more about tokens