|
Abstract
We study how to efficiently evaluate queries over XML documents whose
representation is according to the XML specification, i.e., XML
files. The software architecture is as follows: the XML engine
(i.e., XML parser) makes the structure of the documents explicit. The
query processor operates directly on the output of the XML engine. We
see two basic alternatives how such a query processor operates:
event-based and tree-based. In the first case, the query processor
immediately checks for each event, e.g., begin of an element, if it
contributes to a query result or if it invalidates current partial
results. In the second case, the query processor generates an
explicit transient representation of the document structure and
evaluates the query set-at-a-time. This work evaluates these
approaches and some optimizations in quantitative terms. Our main
results are as follows: the event-based evaluation scheme is
approximately 10% faster, even with all the optimizations from this
article. The overhead of the query processors is small, compared to
the running times of the XML engine. Finally, exploiting DTD
information in this particular context does not lead to a better
performance.
You can directly download a postscript (246 KB) version of this paper.
|