Data Completion in E-commerce

Liat Antwarg Friedman, Gal Lavee, Bracha Shapira, Dorin Shmaryahu

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In e-commerce, incomplete product data presents a significant challenge, particularly in online marketplaces where small businesses and individual sellers often lack the resources to provide full product details. Missing data in product items-whether structured fields like feature-value pairs or unstructured text such as titles and descriptions-can hinder product search, recommendations, and overall marketplace functionality. Addressing these data gaps is essential for enhancing the efficiency and user experience of e-commerce platforms. This paper introduces low-cost machine learning approaches for completing missing textual features in such items, offering an alternative to computationally expensive large language models (LLMs). We propose two cost effective methods: the first extracts data from unstructured fields like item titles or descriptions, while the second employs a Nearest Neighbors method to impute missing values based on similar items. Both methods are evaluated on real-world datasets across diverse product categories, including sports trading cards, motor parts, and computers. Our experiments demonstrate that these low-cost approaches can achieve performance comparable to LLMs, often at a fraction of the computational cost, making them a viable option for large-scale e-commerce platforms.We also demonstrate that completing missing data not only improves data quality but also enhances key tasks such as search optimization and matching items to catalog products, which are critical for e-commerce platforms.

Original languageAmerican English
Title of host publicationAdvances in Database Technology - EDBT
Pages1048-1056
Number of pages9
Edition3
ISBN (Electronic)9783893180981, 9783893180998
DOIs
StatePublished - 10 Mar 2025
Event28th International Conference on Extending Database Technology, EDBT 2025 - Barcelona, Spain
Duration: 25 Mar 202528 Mar 2025

Publication series

NameAdvances in Database Technology - EDBT
Number3
Volume28

Conference

Conference28th International Conference on Extending Database Technology, EDBT 2025
Country/TerritorySpain
CityBarcelona
Period25/03/2528/03/25

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Software
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Data Completion in E-commerce'. Together they form a unique fingerprint.

Cite this