Is a picture worth a thousand words? A deep multi-modal architecture for product classification in e-commerce

Tom Zahavy, Abhinandan Krishnan, Alessandro Magnani, Shie Mannor

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Classifying products precisely and efficiently is a major challenge in modern e-commerce. The high traffic of new products uploaded daily and the dynamic nature of the categories raise the need for machine learning models that can reduce the cost and time of human editors. In this paper, we propose a decision level fusion approach for multi-modal product classification based on text and image neural network classifiers. We train input specific state-of-the-art deep neural networks for each input source, show the potential of forging them together into a multi-modal architecture and train a novel policy network that learns to choose between them. Finally, we demonstrate that our multi-modal network improves classification accuracy over both networks on a real-world largescale product classification dataset that we collected from Walmart.com. While we focus on image-text fusion that characterizes e-commerce businesses, our algorithms can be easily applied to other modalities such as audio, video, physical sensors, etc.

Original languageEnglish
Title of host publicationProceedings of the 30th Innovative Applications of Artificial Intelligence Conference, IAAI 2018
EditorsG. Michael Youngblood, Karen Myers
PublisherThe AAAI Press
Pages7873-7880
Number of pages8
ISBN (Electronic)9781577358008
StatePublished - 2018
Event30th Innovative Applications of Artificial Intelligence Conference, IAAI 2018 - New Orleans, United States
Duration: 2 Feb 20187 Feb 2018

Publication series

NameProceedings of the 30th Innovative Applications of Artificial Intelligence Conference, IAAI 2018

Conference

Conference30th Innovative Applications of Artificial Intelligence Conference, IAAI 2018
Country/TerritoryUnited States
CityNew Orleans
Period2/02/187/02/18

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Computer Vision and Pattern Recognition
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Is a picture worth a thousand words? A deep multi-modal architecture for product classification in e-commerce'. Together they form a unique fingerprint.

Cite this