Grammar Rules PreviousNext

The grammar rules determine the syntax of a language. A rule has the following general form:

RESULT: COMPONENTS...
    ;

where RESULT is the nonterminal symbol that this rule describes and COMPONENTS are various terminal and nonterminal symbols that are put together by this rule. For example,

exp: exp '+' exp
    ;

says that two groupings of type exp, with a + token in between, can be combined into a larger grouping of type exp. Whitespace in rules is significant only to separate symbols. You can add extra whitespace as you wish.

Actions

Scattered among the components can be actions that determine the semantics of the rule. An action looks like this:

{ Eiffel Instructions }

Usually there is only one action and it follows the components.

Note that if the Eiffel code contains Unicode characters, the input file should use the UTF-8 encoding and start with the BOM character.

Multiple Rules

Multiple rules for the same RESULT can be written separately or can be joined with the vertical-bar character | as follows:

RESULT: RULE1-COMPONENTS...
    | RULE2-COMPONENTS...
    ...
    ;

They are still considered distinct rules even when joined in this way.

In order to avoid mistakes such as giving the same RESULT name to two unrelated rules in the grammar, geyacc generates a warning whenever rules for the same RESULT have not been joined.

Empty Rules

If COMPONENTS in a rule is empty, it means that RESULT can match the empty string. For example, here is how to define a comma-separated sequence of zero or more exp groupings:

expseq: -- Empty
    | expseq1
    ;

expseq1: exp
    | expseq1 ',' exp
    ;

It is customary to write a comment -- Empty in each rule with no components.

Recursive Rules

A rule is called "recursive" when its RESULT nonterminal appears also on its right hand side. Nearly all geyacc grammars need to use recursion, because that is the only way to define a sequence of any number of something. Consider this recursive definition of a comma-separated sequence of one or more expressions:

expseq1: exp
    | expseq1 ',' exp
    ;

Since the recursive use of expseq1 is the leftmost symbol in the right hand side, we call this left recursion. By contrast, here the same construct is defined using right recursion:

expseq1: exp
    | exp ',' expseq1
    ;

Any kind of sequence can be defined using either left recursion or right recursion, but it is recommended to use left recursion, because it can parse a sequence of any number of elements with bounded stack space. Right recursion uses up space on the geyacc stack in proportion to the number of elements in the sequence, because all the elements must be shifted onto the stack before the rule can be applied even once. See the parser algorithm for further explanation of this.

Indirect or mutual recursion occurs when the result of the rule does not appear directly on its right hand side, but does appear in rules for other nonterminals which do appear on its right hand side. For example:

expr: primary
    | primary '+' primary
    ;

primary: constant
    | '(' expr ')'
    ;

defines two mutually-recursive nonterminals, since each refers to the other.


Copyright © 1998-2019, Eric Bezault
mailto:
ericb@gobosoft.com
http:
//www.gobosoft.com
Last Updated: 25 Septembeer 2019

HomeTocPreviousNext