Skip to main navigation Skip to search Skip to main content

Associating neural word embeddings with deep image representations using Fisher Vectors

Benjamin Klein, Guy Lev, Gil Sadeh, Lior Wolf

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In recent years, the problem of associating a sentence with an image has gained a lot of attention. This work continues to push the envelope and makes further progress in the performance of image annotation and image search by a sentence tasks. In this work, we are using the Fisher Vector as a sentence representation by pooling the word2vec embedding of each word in the sentence. The Fisher Vector is typically taken as the gradients of the log-likelihood of descriptors, with respect to the parameters of a Gaussian Mixture Model (GMM). In this work we present two other Mixture Models and derive their Expectation-Maximization and Fisher Vector expressions. The first is a Laplacian Mixture Model (LMM), which is based on the Laplacian distribution. The second Mixture Model presented is a Hybrid Gaussian-Laplacian Mixture Model (HGLMM) which is based on a weighted geometric mean of the Gaussian and Laplacian distribution. Finally, by using the new Fisher Vectors derived from HGLMMs to represent sentences, we achieve state-of-the-art results for both the image annotation and the image search by a sentence tasks on four benchmarks: Pascal1K, Flickr8K, Flickr30K, and COCO.

Original languageEnglish
Title of host publicationIEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015
PublisherIEEE Computer Society
Pages4437-4446
Number of pages10
ISBN (Electronic)9781467369640
DOIs
StatePublished - 14 Oct 2015
EventIEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015 - Boston, United States
Duration: 7 Jun 201512 Jun 2015

Publication series

NameProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
Volume07-12-June-2015

Conference

ConferenceIEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015
Country/TerritoryUnited States
CityBoston
Period7/06/1512/06/15

All Science Journal Classification (ASJC) codes

  • Software
  • Computer Vision and Pattern Recognition

Fingerprint

Dive into the research topics of 'Associating neural word embeddings with deep image representations using Fisher Vectors'. Together they form a unique fingerprint.

Cite this