|
|
|
PowerDB-IR - Scalable Information Retrieval and Storage with a Cluster of Databases
|
Title |
PowerDB-IR - Scalable Information Retrieval and Storage with a Cluster of Databases |
Author(s) |
T. Grabs, K. Böhm, H.J. Schek |
Type |
article |
Booktitle |
Knowledge and Information Systems (2004) 6: 465-505 ISSN : 0219-1377 (Paper) 0219-3116 (Online) Link : http://dx.doi.org/10.1007/s10115-003-0120-ySpringer-Verlag London Ltd. 2004 |
Organization |
Institute of Information Systems, ETH Zurich |
Month |
July
|
Year |
2004 |
|
Abstract
Our objective is a scalable infrastructure for information retrieval
(IR) with up-to-date retrieval results in the presence of updates. Timely
processing of updates is important with novel application domains such as
e-commerce. These issues are challenging, given the additional requirement
that the system must scale well. We have built PowerDB-IR, a system
that has the characteristics sought. This article describes its design,
implementation, and evaluation. We follow a three-tier architecture
with a database cluster as the bottom layer for storage management.
The rationale for a database cluster is to 'scale out', i.e., to
add further cluster nodes, whenever necessary for better performance. The
middle tier provides IR-specific retrieval and update services. We
deploy state-of-the-art middleware software to coordinate the cluster
and to invoke IR-specific components. PowerDB-IR extends the middleware
layer with service decomposition and parallelisation. PowerDB-IR has
the following features: It supports state-of-the-art retrieval models
such as vector space retrieval. It allows documents to be inserted
and retrieved concurrently and ensures up-to-date retrieval results
with almost no overhead. PowerDB-IR ensures the correctness of global
concurrency and recovery. Alternative physical data organisation schemes
and respective query processing techniques provide adequate performance
for different workloads and database sizes. Scaling out the database
cluster yields higher throughput and lower response times. We have run
extensive experiments with PowerDB-IR using several commercial database
systems as well as different middleware products. Further experiments have
quantified the effect of transactional guarantees on performance. The
main result is that PowerDB-IR shows surprisingly good scalability and
low response times.
|
!!! Dieses Dokument stammt aus dem
ETH Web-Archiv und wird nicht mehr gepflegt !!!
!!! This document is stored in the
ETH Web archive and is no longer maintained !!!