Fishing in the stream: Similarity search over endless data

Naama Kraus, David Carmel, Idit Keidar

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Similarity search is the task of retrieving data items that are similar to a given query. In this paper, we introduce the time-sensitive notion of similarity search over endless data-streams (SSDS), which takes into account data quality and temporal characteristics in addition to similarity. SSDS is challenging as it needs to process unbounded data, while computation resources are bounded. We propose Stream-LSH, a randomized SSDS algorithm that bounds the index size by retaining items according to their freshness, quality, and dynamic popularity attributes. We show that Stream-LSH increases recall when searching for similar items compared to alternative approaches using the same space capacity.

Original languageEnglish
Title of host publicationProceedings - 2017 IEEE International Conference on Big Data, Big Data 2017
EditorsJian-Yun Nie, Zoran Obradovic, Toyotaro Suzumura, Rumi Ghosh, Raghunath Nambiar, Chonggang Wang, Hui Zang, Ricardo Baeza-Yates, Xiaohua Hu, Jeremy Kepner, Alfredo Cuzzocrea, Jian Tang, Masashi Toyoda
Pages964-969
Number of pages6
ISBN (Electronic)9781538627143
DOIs
StatePublished - 1 Jul 2017
Event5th IEEE International Conference on Big Data, Big Data 2017 - Boston, United States
Duration: 11 Dec 201714 Dec 2017

Publication series

NameProceedings - 2017 IEEE International Conference on Big Data, Big Data 2017
Volume2018-January

Conference

Conference5th IEEE International Conference on Big Data, Big Data 2017
Country/TerritoryUnited States
CityBoston
Period11/12/1714/12/17

Keywords

  • Dynamic popularity
  • Locality sensitive hashing
  • Retention policy
  • Similarity search
  • Stream search

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Hardware and Architecture
  • Information Systems
  • Information Systems and Management
  • Control and Optimization

Fingerprint

Dive into the research topics of 'Fishing in the stream: Similarity search over endless data'. Together they form a unique fingerprint.

Cite this