Abstract: Gabriella Csurka

SSIP 2011 Home

Registration
Lectures
Important Dates
Schedule
Projects
Pictures
Participants
History
General / Travel
Sponsors
Organizers
Contact Info

Call for Participation

Gabriella Csurka:
BOV and Fisher Vectors in Large Scale Image Classification and Retrieval

First, I will shortly recall the baseline bag-of-visual words (BOV) approach for image categorization and then present its extension using the Fisher Kernel representation. The main idea of the Fisher Kernel is to characterize a signal with a gradient vector derived from a generative probability model (in our case a visual vocabulary built with a GMM).
This representation has the advantage to give better performance than BOV with lower computational cost. As it is a model-dependent (based on the visual vocabulary), but class-independent representation (image signature) it is suitable both for supervised (classification, semantic segmentation) and unsupervised (clustering, retrieval) tasks. I will show examples on different tasks.
However, digital information is no longer mono-modal: web pages can contain text, images, animations, sound and video; the valuable content within a photo sharing site can be found in tags and comments as much as in the actual visual content it contains. Therefore, in the second part, after a brief introduction of text representations, I will present different information fusion techniques and show that if image features are appropriately combined with textual information, the multi-modal system generally outperforms the mono-modal systems. I will finish the talk showing some of the applications of the multi-modal fusion strategies, such as image-auto annotation, multi-modal retrieval and content creation.

Page last modified:
July 11, 2011 2:53 PM