A situation where shift/reduce conflicts appear is in arithmetic expressions. Here shifting is not always the preferred resolution; the geyacc declarations for operator precedence allow you to specify when to shift and when to reduce.
Consider the following ambiguous grammar fragment (ambiguous because the input 1 - 2 * 3 can be parsed in two different ways):
expr: expr '-' expr | expr '*' expr | expr '<' expr | '(' expr ')' ... ;
Suppose the parser has seen the tokens 1, - and 2; should it reduce them via the rule for the subtraction operator? It depends on the next token. Of course, if the next token is ), we must reduce; shifting is invalid because no single rule can reduce the token sequence - 2 ) or anything starting with that. But if the next token is * or <, we have a choice: either shifting or reduction would allow the parse to complete, but with different results.
To decide which one geyacc should do, we must consider the results. If the next operator token OP is shifted, then it must be reduced first in order to permit another opportunity to reduce the sum. The result is (in effect) 1 - (2 OP 3). On the other hand, if the subtraction is reduced before shifting OP, the result is (1 - 2) OP 3. Clearly, then, the choice of shift or reduce should depend on the relative precedence of the operators - and OP: * should be shifted first, but not <.
What about input such as 1 - 2 - 5; should this be (1 - 2) - 5 or should it be 1 - (2 - 5)? For most operators we prefer the former, which is called left association. The latter alternative, right association, is desirable for assignment operators. The choice of left or right association is a matter of whether the parser chooses to shift or reduce when the stack contains 1 - 2 and the look-ahead token is -: shifting makes right-associativity.
Geyacc allows you to specify these choices with the operator precedence declarations %left and %right. Each such declaration contains a list of tokens, which are operators whose precedence and associativity is being declared. The %left declaration makes all those operators left-associative and the %right declaration makes them right-associative. A third alternative is %nonassoc, which declares that it is a syntax error to find the same operator twice "in a row".
The relative precedence of different operators is controlled by the order in which they are declared. The first %left or %right declaration in the file declares the operators whose precedence is lowest, the next such declaration declares the operators whose precedence is a little higher, and so on.
In our example, we would want the following declarations:
%left '<' %left '-' %left '*'
In a more complete example, which supports other operators as well, we would declare them in groups of equal precedence. For example, '+' is declared with '-':
%left '<' '>' '=' NE LE GE %left '+' '-' %left '*' '/'
Here NE and so on stand for the operators for "not equal" and so on. We assume that these tokens are more than one character long and therefore are represented by names, not character literals.
The first effect of the precedence declarations is to assign precedence levels to the terminal symbols declared. The second effect is to assign precedence levels to certain rules: each rule gets its precedence from the last terminal symbol mentioned in the components. (You can also specify explicitly the precedence of a rule.)
Finally, the resolution of conflicts works by comparing the precedence of the rule being considered with that of the look-ahead token. If the token's precedence is higher, the choice is to shift. If the rule's precedence is higher, the choice is to reduce. If they have equal precedence, the choice is made based on the associativity of that precedence level. The verbose output file made by -v says how each conflict was resolved.
Not all rules and not all tokens have precedence. If either the rule or the look-ahead token has no precedence, then the default is to shift.
Often the precedence of an operator depends on the context. This sounds outlandish at first, but it is really very common. For example, a minus sign typically has a very high precedence as a unary operator, and a somewhat lower precedence (lower than multiplication) as a binary operator.
The geyacc precedence declarations, %left, %right and %nonassoc, can only be used once for a given token; so a token has only one precedence declared in this way. For context-dependent precedence, you need to use an additional mechanism: the %prec modifier for rules. The %prec modifier declares the precedence of a particular rule by specifying a terminal symbol whose precedence should be used for that rule. It's not necessary for that symbol to appear otherwise in the rule. The modifier's syntax is:
and it is written after the components of the rule. Its effect is to assign the rule the precedence of TERMINAL-SYMBOL, overriding the precedence that would be deduced for it in the ordinary way. The altered rule precedence then affects how conflicts involving that rule are resolved.
Here is how %prec solves the problem of unary minus. First, declare a precedence for a fictitious terminal symbol named UMINUS. There are no tokens of this type, but the symbol serves to stand for its precedence:
... %left '+' '-' %left '*' %left UMINUS
Now the precedence of UMINUS can be used in specific rules:
exp: ... | exp '-' exp ... | '-' exp %prec UMINUS
Copyright © 1998-2005, Eric
Last Updated: 22 February 2005