A description of the evaluation process PreviousNext

This chapter describes the process by which the evaluator turns a text string representing an XPath expression, into a sequence of items. This description should act as a guide for anyone interested in examining the code, and that includes the author (I'm writing it right now to help me check I am being consistent throughout).

Conceptually, the process proceeds in a series of phases. In practice, there is some limited overlapping of these phases. To get a good picture of how an XPath evaluation is supposed to work, see XML Path Language (XPath) 2.0.

The steps taken by the Gobo XPath evaluation engine are as follows:

Items, Nodes, Values, Sequences, Iterators and Expressions

The input to the evaluation process is a text string, representing an XPath Expression, and a Context Item. First then, some definitions:

The result of evaluation is a sequence of zero or more Items. A sequence is a Value, but in the data model, it can also be represented as an Iterator.

A sequence of one item is completely interchangeable with the item itself.

The class for a sequence is XM_XPATH_SEQUENCE_VALUE.

The basic unit of information coming out of the evaluation process. An item is either an Atomic Value or else it is a Node.

The class for an item is XM_XPATH_ITEM.

A node is a constituent of the XML input tree. It is one of:
  • Document
  • Element
  • Attribute
  • Namespace
  • Text
  • Comment
  • Processing instruction

The class for a node is XM_XPATH_NODE.

The architecture supports multiple implementations of the data model's tree structure. The only implementations at present are the standard tree implementation and the tiny tree implementation. In these implementation, the class for a node is XM_XPATH_TREE_NODE and XM_XPATH_TINY_NODE respectively.

A value is the result of evaluating an Expression, but it is also an expression in it's own right. However, it never has sub-expressions.

A value can be regarded as a Sequence, although sometimes it is a sequence of length one, or even zero.

The class for a value is XM_XPATH_VALUE.

Atomic Value
An atomic value is a value consisting of a single Item.

The class for an atomic value is XM_XPATH_ATOMIC_VALUE.

An expression is the basic building block of Xpath. Expressions can be nested with full generality (sub_expressions lists the sub-expressions). Expressions evaluate to a Sequence of Items.

The class for an expression is XM_XPATH_EXPRESSION.

An expression is either a Value or an instance of XM_XPATH_COMPUTED_EXPRESSION.

An iterator is a construct within the programming model for traversing a Sequence of Items.

The class for an iterator is XM_XPATH_SEQUENCE_ITERATOR.

Parsing the expression text

XM_XPATH_EXPRESSION_FACTORY has a routine make_expression which takes a STRING (holding the text of the expression to be parsed) and an XM_XPATH_STATIC_CONTEXT. The result of calling make_expression is an optimized XM_XPATH_EXPRESSION in parsed_expression. If a parse error has occurred though, this will be Void. In this case is_parse_error will be set to True, and parsed_error_value will be set to an instance of XM_XPATH_ERROR_VALUE.

A side-effect is that functions and variables may well be bound in the static context.

Setting the debug-key "XPath expression factory" will cause make_expression to print a textual representation of the expression tree to the standard error stream, immediately after parsing is sucessfull.

Simplifying the parsed expression

If parsing is sucessful XM_XPATH_EXPRESSION_FACTORY's make_expression routine goes on to call simplify on the expression. This performs context-independent optimizations on the expression and (recusively) it's sub-expressions. Current may be marked in error (So the caller of simplify must test is_error. If it is True, you can access error_value).

Note that if a simplification error occurs, make_expression treats it the same way as a parse error.

Setting the debug-key "XPath expression factory" will cause make_expression to print a textual representation of the simplified expression tree to the standard error stream, immediately after simplification.

Performing static type checking and context-dependent optimizations

After the simplication process is complete (the picture here is itself simplified, as simplify may itself be called by later phases, especially if static analysis is unable to completely determine the type of an operand), the next phase is static analysis of the expression, to determine the types of all expressions. This is accomplished by calling analyze on the expression.

Analyze takes an XM_XPATH_STATIC_CONTEXT as it's sole parameter.

It may change the static context (?? check this some time).

As a command, is quite likely change the expression in one of several ways:

Setting the debug-key "XPath evaluator" will cause evaluate to print a textual representation of the expression tree to the standard error stream, immediately after static analysis is sucessfull.

Evaluating the expression

If static analysis is sucessfull, evaluate proceeds to the evaluation stage. XM_XPATH_EXPRESSION has no fewer than six routines for performing evaluation. All of them take an XM_XPATH_CONTEXT as sole parameter, though this may be Void on occaisions. If it is not Void, then it is liable to be altered by any of these routines (as the context_item is liable to change), so none of them are pure functions.

Copyright 2004, Colin Adams and others
Last Updated: Thursday, April 15th, 2004