sourceforge / freespace / guidod / pTA / TextArray

The persistant TextArray of the XEE Project

Introduction

The XEE Project is short for "XML Query Execution Engine" being a prototype of a Query and Managment sytem for XML documents. At the heart of it there is a special special data structure called the Access Support Tree / TextArray (AST/TA). Within this AST/TA Model the TextArray manages the text content of XML documents.

I have been implementing this TextArray for block-oriented secondary storage media, called the persistant TextArray (or short "pTA"). It uses the well-known technique of a positional B-Tree for that. This allows for an implementation that handles both Queries and Updates quickly and efficiently.

In the course a study was written by me that looks at the environment of the AST/TA and the usage of XML documents being both queried and updated. The operations have strong needs beyond just scanning the text, modifications to the text content are usual and in many cases in an amount of words to phrases of natural languages.

The choice for using a positional-B-Tree turns out to be well suited to this task and the study contains some tables and diagrams that show the result of the implementation. Furthermore, a section of future work is given where the special implementation can be extended later to support specific characteristics of the AST/TA model.

Links and Papers

Many of these pages are in German.
guidod-studienarbeit-vortrag-2002.pdf
Presentation about the implementation of the persistant TextArray.
It covers large parts of the content of the study as well in pictures and diagrams and overviews (speech time: 70min).
guidod-separator-problem-2002.pdf
A few raw pages of a presentation that covers my view on the separator problems inherent to original form of the AST/TA model.
These pages have been somewhat unfinished but they were used on the same day of above's presentation abou the pTA (speech time: 10min).
guidod-studienarbeit-vortrag-2003.pdf
A presentation covering experiences from using the AST/TA model with the XML-g Project for annotation of C sources and generating autodoc documentation.
guidod-studienarbeit-pTA-2003.pdf
The final document on the design and implementation of the pTA, the persistant TextArray of the AST/TA within the XEE Project.
http://dbis.informatik.hu-berlin.de/research/XML/xee
The home page of the XEE Project at the Humboldt University Berlin (alma mater berolinensis). At lot more information can be found there.
http://dbis.informatik.hu-berlin.de/pub/papers/techreports/HUB-IB-157.pdf
Technical Report HUB-IB-157, December 2001:
Dieter Scheffner:
Access Support Tree & TextArray: Data Structures for XML Document Storage
http://dbis.informatik.hu-berlin.de/pub/papers/techreports/HUB-IB-158.pdf
Technical Report HUB-IB-157, March 2002:
Dieter Scheffner, Johann-Christoph Freytag:
The XML Query Execution Engine (XEE)
http://computer.org/proceedings/ssdbm/1632/16320155abs.htm
Proceedings of the 14th International Conference on Scientific and Statistical Database Management, Edinburgh, July 2002:
Dieter Scheffner, Johann-Christoph Freytag:
Access Support Tree and TextArray: A Data Structure for XML Document Storage and Retrieval

(C) 2003 Guido Draheim 31-Jan-2003