Regular Expression Parser and Compiler

I have been given an assignment focused on regular expressions (regexp) and parsing. This assignment is split into two parts. A parser and compiler that converts a regular expression into a set of Finite State Machines (FSM) that can then be loaded into a second piece of software that searches a piece of text for patterns that match the regular expression. I will work on the first part and my partner will work on the second.

The parser and compiler will adhere to a set of rules that have been laid out for us in the assignment document. An expression will then be run through that set of rules and the program will output a set of FSM that the second piece can then read. This will require format to be agreed with my partner but I feel this should be relatively simple. The rules for this compiler are as follows:

Rules

  • Any non-special character is a literal
  • . (period) is a wild card that can be any literal
  • adjacent regexps are concatenated to a single regexp
  • * indicates closure (zero or more) on the preceding regexp
  • ? indicates that the preceding regexp can occur zero or one times
  • | is an infix alternation operator such that if r and e are regexps r|e means r or e
  • ( and ) may enclose a regexp to raise its precedence such that (e) is a regexp that is equivilent to e
  • \ is an escape character that matches λ but indicates that the following character loses any special meaning and is a literal
  • Operator precedence is as follows:
    • Escaped characters
    • Paretheses
    • Repetition
    • Concatenation
    • Alternation
  • Optional: ! not operator

An example of this would be (r | e) a*:

State0123456
Typebrbrrebraλ
Next 1124454.
Next 213..6..
FMS Table of Regular Expression Above
Regular Expression FSM graph

I will continue to work on this over the next few weeks and provide updates as I go along with the development process.

Leave a Reply

Your email address will not be published. Required fields are marked *