Efficiently Archiving Photos under Storage Constraints

Susan B. Davidson, Shay Gershtein, Tova Milo, Slava Novgorodov, May Shoshan

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Our ability to collect data is rapidly outstripping our ability to effectively store and use it. Organizations are therefore facing tough decisions of what data to archive (or dispose of) to effectively meet their business goals. We address this general problem in the context of image data (photos) by proposing which photos to archive to meet an online storage budget. The decision is based on factors such as usage patterns and their relative importance, the quality and size of a photo, the relevance of a photo for a usage pattern, the similarity between different photos, as well as policy requirements of what photos must be retained. We formalize the photo archival problem, analyze its complexity, and give two approximation algorithms. One algorithm comes with an optimal approximation guarantee and another, more scalable, algorithm that comes with both worst-case and data-dependent guarantees. Based on these algorithms we implemented an end-to-end system, PHOcus, and discuss how to automatically derive the inputs for this system in many settings. An extensive experimental study based on public as well as private datasets demonstrates the effectiveness and efficiency of PHOcus. Furthermore, a user study using business analysts in a real e-commerce application shows that it can save a tremendous amount of human effort and yield unexpected insights.

Original languageEnglish
Title of host publicationProceedings of the 26th International Conference on Extending Database Technology, EDBT 2023
Pages591-603
Number of pages13
Edition3
ISBN (Electronic)9783893180929
DOIs
StatePublished - 20 Mar 2023
Event26th International Conference on Extending Database Technology, EDBT 2023 - Ioannina, Greece
Duration: 28 Mar 202331 Mar 2023

Publication series

NameAdvances in Database Technology - EDBT
Number3
Volume26

Conference

Conference26th International Conference on Extending Database Technology, EDBT 2023
Country/TerritoryGreece
CityIoannina
Period28/03/2331/03/23

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Software
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Efficiently Archiving Photos under Storage Constraints'. Together they form a unique fingerprint.

Cite this