Box queries over multi-dimensional streams

Roy Friedman, Rana Shahout

Research output: Contribution to journalArticlepeer-review

Abstract

Answering statistical queries about streams of online arriving data is becoming increasingly important. Often, such data includes multiple-attributes, so data elements can be viewed as points in a multi-dimensional universe. This paper extends existing works on streaming algorithms by studying the ability to perform box queries on online multi-dimensional data streams. We develop three algorithms C-DARQ, DARQ and MARQ that support such capabilities for a large number of statistical functions including (but not limited to) counting, frequency estimation, heavy-hitters etc. We also apply our algorithms in distributed settings, in which measurements are recorded independently by multiple sites (e.g., multiple routers), and the goal is to obtain a global network analysis. The protocols are analyzed and evaluated over synthetic dataset, Chicago dataset, and a Facebook dataset from Kaggle in multiple dimensions (up to 10). Our algorithms asymptotically improve the space bounds as well as update and query performance of existing works. Unlike known approaches, our algorithms can also be used to solve a larger class of problems beyond counting. We further discuss extending our work to the sliding window model and when the dimensions’ bounds are a-priori unknown.

Original languageEnglish
Article number102086
JournalInformation Systems
Volume109
DOIs
StatePublished - Nov 2022
Externally publishedYes

ASJC Scopus subject areas

  • Software
  • Information Systems
  • Hardware and Architecture

Fingerprint

Dive into the research topics of 'Box queries over multi-dimensional streams'. Together they form a unique fingerprint.

Cite this