TY - GEN
T1 - Data Completion in E-commerce
AU - Friedman, Liat Antwarg
AU - Lavee, Gal
AU - Shapira, Bracha
AU - Shmaryahu, Dorin
N1 - Publisher Copyright: © 2025 OpenProceedings.org. All rights reserved.
PY - 2025/3/10
Y1 - 2025/3/10
N2 - In e-commerce, incomplete product data presents a significant challenge, particularly in online marketplaces where small businesses and individual sellers often lack the resources to provide full product details. Missing data in product items-whether structured fields like feature-value pairs or unstructured text such as titles and descriptions-can hinder product search, recommendations, and overall marketplace functionality. Addressing these data gaps is essential for enhancing the efficiency and user experience of e-commerce platforms. This paper introduces low-cost machine learning approaches for completing missing textual features in such items, offering an alternative to computationally expensive large language models (LLMs). We propose two cost effective methods: the first extracts data from unstructured fields like item titles or descriptions, while the second employs a Nearest Neighbors method to impute missing values based on similar items. Both methods are evaluated on real-world datasets across diverse product categories, including sports trading cards, motor parts, and computers. Our experiments demonstrate that these low-cost approaches can achieve performance comparable to LLMs, often at a fraction of the computational cost, making them a viable option for large-scale e-commerce platforms.We also demonstrate that completing missing data not only improves data quality but also enhances key tasks such as search optimization and matching items to catalog products, which are critical for e-commerce platforms.
AB - In e-commerce, incomplete product data presents a significant challenge, particularly in online marketplaces where small businesses and individual sellers often lack the resources to provide full product details. Missing data in product items-whether structured fields like feature-value pairs or unstructured text such as titles and descriptions-can hinder product search, recommendations, and overall marketplace functionality. Addressing these data gaps is essential for enhancing the efficiency and user experience of e-commerce platforms. This paper introduces low-cost machine learning approaches for completing missing textual features in such items, offering an alternative to computationally expensive large language models (LLMs). We propose two cost effective methods: the first extracts data from unstructured fields like item titles or descriptions, while the second employs a Nearest Neighbors method to impute missing values based on similar items. Both methods are evaluated on real-world datasets across diverse product categories, including sports trading cards, motor parts, and computers. Our experiments demonstrate that these low-cost approaches can achieve performance comparable to LLMs, often at a fraction of the computational cost, making them a viable option for large-scale e-commerce platforms.We also demonstrate that completing missing data not only improves data quality but also enhances key tasks such as search optimization and matching items to catalog products, which are critical for e-commerce platforms.
UR - http://www.scopus.com/inward/record.url?scp=105007871386&partnerID=8YFLogxK
U2 - 10.48786/edbt.2025.88
DO - 10.48786/edbt.2025.88
M3 - Conference contribution
T3 - Advances in Database Technology - EDBT
SP - 1048
EP - 1056
BT - Advances in Database Technology - EDBT
T2 - 28th International Conference on Extending Database Technology, EDBT 2025
Y2 - 25 March 2025 through 28 March 2025
ER -