Software for Lexical Analysis

Abstraction: Lex which is a popular lexical analyser generator provides support for composing plans which have a control flow directed by cases of regular looks in a given input watercourse. It is most suited for editor-script type interlingual renditions and for sectioning input in order to fix for a parsing modus operandi. The plans that perform lexical analysis written with Lex are able to accept equivocal specifications and choose the longest lucifer possible at each input point.

Index footings: Phases of a compiler, Structure of a Lex file, Lex regular looks, Operators, Lex predefined variables, Lex library modus operandis, Conclusion.


Lex is a plan generator which is designed for the intent of lexical processing of character input watercourses. It is able to accept a high degree, job oriented specification for fiting character twine, and generates a plan in a general intent linguistic communication that can acknowledge regular looks. The regular looks are specified by the user in the beginning specifications given to Lex.

Get quality help now
Verified writer

Proficient in: Computers

4.7 (348)

“ Amazing as always, gave her a week to finish a big assignment and came through way ahead of time. ”

+84 relevant experts are online
Hire writer

The Lex written codification recognizes these looks in an input watercourse and dividers the input watercourse into strings fiting the looks.


Basic maps of a Lexical Analyzer

  • To group sequence of characters into lexemes – smallest meaningful entity in a linguistic communication ( keywords, identifiers, invariables )
  • To read characters from a file and hive away them in a buffer which helps to diminish the latency that occurred due to i/o. Lexical analyser manages this buffer
  • To makes usage of the theory of regular linguistic communications and finite province machines
  • Lex is an instrument that constructs lexical analysers from regular look specifications.

    Get to Know The Price Estimate For Your Paper
    Number of pages
    Email Invalid email

    By clicking “Check Writers’ Offers”, you agree to our terms of service and privacy policy. We’ll occasionally send you promo and account related email

    "You must agree to out terms of services and privacy policy"
    Check writers' offers

    You won’t be charged yet!


Lex beginning is a tabular array of regular looks and comparable matching fragments of the plan. This tabular array is so transformed to a plan which reads a watercourse that is input, copying it to an end product watercourse and so partitioning the input into strings which so match the given looks. As every such twine gets recognized, the corresponding plan fragment gets executed. The acknowledgment of the looks is executed by a deterministic finite mechanization that is generated by Lex.

The plan fragments that are written by the user are so executed in the same order in which the corresponding regular looks occur in the input watercourse.

Lex turns the user ‘s looks and actions ( besides called beginning ) into the host all-purpose linguistic communication ; the generated plan is called as yylex. The yylex plan will acknowledge looks in a watercourse ( which is called input ) and execute the specified actions for each look as it is detected.

  • & A ; Aacute ; Where the definitions and the user subprograms can frequently be omitted. The 2nd % % is besides optional, but the first one is required to tag the beginning of the regulations. So, we can state that the minimal Lex plan can be as:
  • % % ( no definitions, no regulations )
  • & A ; Aacute ; It can be translated into a plan which can copy the input to the end product without any alterations.
  • & A ; Aacute ; In the lineation of Lex plans shown supra, the user ‘s control determinations are represented by the regulations. Rules are in the signifier of a tabular array whose left column contains regular looks and the right column contains actions which are the plan fragments to be executed when these looks are recognized by the analyser.


An single regulation can be as Integer printf ( “ found keyword INT ” ) ; expression for the twine whole number in the input watercourse and publish the message ”found keyword INT ” whenever it appears. In this illustration the host procedural linguistic communication C and the C library map printf is used to publish the twine. The terminal of the look is represented by the first space or tab character. If the action is simply a individual C look, it can merely be given on the right side of the line ; if it is compound, or takes more than a line, it should be enclosed in braces.

Suppose it is desired to alter a figure of words from British to American spelling. Lex regulations such as coloring material printf ( “ colour ” ) ; mechanise printf ( “ mechanise ” ) ; gasoline printf ( “ gas ” ) ; would be a start. But the job is that these regulations are non rather plenty, since the word crude oil would go gaseum, which is wrong.


Some of the predefined variables used by the Lex analyser are:

  • yytext — a twine incorporating the lexeme
  • yyleng — the length of the lexeme
  • yyin — the input watercourse arrow
  • the default input of default chief ( ) is stdin
  • yyout — the end product watercourse arrow
  • the default end product of default chief ( ) is stdout.
  • cs20: % ./a.out & lt ; inputfile & gt ; outfile


A regular look specifies the strings that need to be matched. It contains text characters ( which match the corresponding characters in the strings being compared ) and operator characters ( which specify repeats, picks, and other characteristics ) . The letters of the alphabet and the figures are ever text characters ; therefore the regular look whole number matches the twine whole number wherever it appears and the look a89F looks for the twine a89F.

  • The citation grade operator ( “ ) indicates that whatever is contained between a brace of quotation marks is to be taken as text characters. Therefore xyz ” ++ ” matches the twine xyz++ when it appears.
  • Therefore by citing every non-alphanumeric character being used as a text character, the user can avoid retrieving the list above of current operator characters, and is safe should farther extensions to Lex lengthen the list.
  • An operator character can besides be converted into a text character by predating it with as in xyz++ which is another, less clear, tantamount of the above looks.
  • Any clean character non contained within [ ] must be quoted. Several normal C escapes with are recognized: is newline, is tab, and is backspace. To come in itself, use . Since newline is illegal in an look, must be used ; it is non required to get away and backspace. Every character but space, check, newline and the list above is ever a text character.

Fictional character categories

  • Classs of characters can be specified utilizing the operator brace [ ] . The building [ rudiment ] matches a individual character, which may be a, B, or degree Celsius ( Within square brackets, most operator significances are ignored ) .
  • The three characters that are particular are – and ? . The – character indicates scopes. For illustration: [ a-z0-9 & lt ; & gt ; ] ‘- ‘ indicates the character category incorporating all the lower instance letters, the figures, the angle brackets, and underline. Scopes may be given in either order. Using – between any brace of characters which are non both upper instance letters, both lower instance letters, and both figures is implementation dependant and will acquire a warning message.
  • If it is desired to include the character – in a character category, it should be foremost or last ; therefore [ -+0-9 ] matches all the figures and the two marks. In character categories, the ? operator must look as the first character after the left bracket ; it indicates that the ensuing twine is to be complemented with regard to the computing machine character set. Thus [ ?abc ] matches all characters except a, B, or degree Celsiuss, including all particular or control characters ; or
  • [ ?a-zA-Z ] is any character which is non a missive. The character provides the usual flights within character category brackets.

Arbitrary character

  • It is used to fit about any character which contains the operator character – the category of all characters except newline.
  • Escaping into octal is possible although non-portable.

E.g. [ 40-176 ] lucifers all printable characters in the ASCII character set, from octal 40 ( space ) to octal 176 ( tilde ) .

Optional looks

The operator ‘ ? ‘ indicates an optional component of an look. E.g. – Bachelor of Arts? hundred lucifers either Ac or rudiment.

Repeated looks

  • Repeats of categories are indicated by the operators * and a+ which represents any figure of back-to-back a characters, including zero ; while a+ is one or more cases of a.
  • For illustration: [ a-z ] + is all strings of lower instance letters. And [ A-Za-z ] [ A-Za-z0-9 ] * indicates all alphameric strings with a taking alphabetic character. This is a typical look for acknowledging identifiers in computing machine linguistic communications

Alternation and Group

The operator | indicates alternation. E.g. – ( ab | cadmium ) will fit either ab or cadmium. The parentheses are used for grouping, although they are non necessary on the outside degree


The undermentioned library modus operandis are used by Lex analyser:

  • yylex ( ) -The default chief ( ) contains a call of yylex ( )
  • yymore ( ) -return the following item
  • yyless ( n ) -retain the first n characters in yytext
  • yywarp ( ) -is called whenever Lex reaches an end-of-file The default yywarp ( ) ever returns 1.


There are two stairss in roll uping a Lex beginning plan. First, the Lex beginning must be turned into a generated plan in the host general purpose linguistic communication. Then this plan must be compiled and loaded, normally with a library of Lex subprograms. The I/O library is defined in footings of the C standard library.

The C plans generated by Lex are somewhat different on OS/370, because the OS compiler is less powerful than the UNIX or GCOS compilers, and does less at compile clip.

The ensuing plan is placed on the usual file a.out for ulterior executing. Although the default Lex I/O modus operandis use the C criterion library, the Lex zombi themselves do non make so ; if private versions of input, end product and unput are given, the library can be avoided.


  • Lex- A Lexical Analyzer Generator, by-M. E. Lesk and E. Schmidt

Cite this page

Software for Lexical Analysis. (2020, Jun 01). Retrieved from

👋 Hi! I’m your smart assistant Amy!

Don’t know where to start? Type your requirements and I’ll connect you to an academic expert within 3 minutes.

get help with your assignment