KEYWORD SEARCH IN TEXT CUBE: FINDING TOP-K RELEVANT CELLS
Description
KEYWORD SEARCH IN TEXT CUBE: FINDING TOP-K RELEVANT CELLS
BOLIN DING*, YINTAO YU*, BO ZHAO*, CINDY XIDE LIN*, JIAWEI HAN*, AND CHENGXIANG ZHAI*
Abstract. We study the problem of keyword search in a data cube with text-rich dimension(s)
(so-called text cube). The text cube is built on a multidimensional text database, where each row
is associated with some text data (e.g., a document) and other structural dimensions (attributes).
A cell in the text cube aggregates a set of documents with matching attribute values in a subset
of dimensions. A cell document is the concatenation of all documents in a cell. Given a keyword
query, our goal is to find the top-k most relevant cells (ranked according to the relevance scores of
cell documents w.r.t. the given query) in the text cube.
We define a keyword-based query language and apply IR-style relevance model for scoring and
ranking cell documents in the text cube. We propose two efficient approaches to find the top-k
answers. The proposed approaches support a general class of IR-style relevance scoring formulas
that satisfy certain basic and common properties. One of them uses more time for pre-processing
and less time for answering online queries; and the other one is more efficient in pre-processing and
consumes more time for online queries. Experimental studies on the ASRS dataset are conducted
to verify the efficiency and effectiveness of the proposed approaches.
Resources
Name |
Format |
Description |
Link |
|
33 |
KEYWORD SEARCH IN TEXT CUBE: FINDING TOP-K RELEVANT CELLS |
https://c3.nasa.gov/dashlink/static/media/publication/Paper_12_.pdf |