Expressing and Optimizing Similarity-Based Queries in SQL

0
79

Authors: , Min Wang, Sriram Padmanabhan, X. Sean Wang

Tags: 2004, conceptual modeling

Searching for similar objects (in terms of near and nearest neighbors) of a given query object from a large set is an essential task in many applications. Recent years have seen great progress towards efficient algorithms for this task. This paper takes a query language perspective, equipping SQL with the near and nearest search capability by adding a user-defined-predicate, called NN-UDP. The predicate indicates, among a set of objects, if an object is a near or nearest-neighbor of a given query object. The use of the NN-UDP makes the queries involving similarity searches intuitive to express. Unfortunately, traditional cost-based optimization methods that deal with traditional UDPs do not work well for such SQL queries. Better execution plans are possible with the introduction of a new operator, called NN-OP, which finds the near or nearest neighbors from a set of objects for a given query object. An optimization algorithm proposed in this paper can produce these plans that take advantage of the efficient search algorithms developed in recent years. To assess the proposed optimization algorithm, this paper focuses on applications that deal with streaming time series. Experimental results show that the optimization strategy is effective.

Read the full paper here: https://link.springer.com/chapter/10.1007/978-3-540-30464-7_36