Paper in ECCV Workshop 2012: “Weakly Supervised Learning of Object Segmentations from Web-Scale Videos”

Weakly Supervised Learning of Object Segmentations from Web-Scale Videos

  • G. Hartmann, M. Grundmann, J. Hoffman, D. Tsai, V. Kwatra, O. Madani, S. Vijayanarasimhan, I. Essa, J. Rehg, and R. Sukthankar (2012), “Weakly Supervised Learning of Object Segmentations from Web-Scale Videos,” in Proceedings of ECCV 2012 Workshop on Web-scale Vision and Social Media, 2012. [PDF] [DOI] [BIBTEX]
    @InProceedings{    2012-Hartmann-WSLOSFWV,
      author  = {Glenn Hartmann and Matthias Grundmann and Judy
          Hoffman and David Tsai and Vivek Kwatra and Omid
          Madani and Sudheendra Vijayanarasimhan and Irfan
          Essa and James Rehg and Rahul Sukthankar},
      booktitle  = {Proceedings of ECCV 2012 Workshop on Web-scale
          Vision and Social Media},
      doi    = {10.1007/978-3-642-33863-2_20},
      pdf    = {http://www.cs.cmu.edu/~rahuls/pub/eccv2012wk-cp-rahuls.pdf}
          ,
      title    = {Weakly Supervised Learning of Object Segmentations
          from Web-Scale Videos},
      year    = {2012}
    }

Abstract

We propose to learn pixel-level segmentations of objects from weakly labeled (tagged) internet videos. Speci cally, given a large collection of raw YouTube content, along with potentially noisy tags, our goal is to automatically generate spatiotemporal masks for each object, such as dog”, without employing any pre-trained object detectors. We formulate this problem as learning weakly supervised classi ers for a set of independent spatio-temporal segments. The object seeds obtained using segment-level classi ers are further re ned using graphcuts to generate high-precision object masks. Our results, obtained by training on a dataset of 20,000 YouTube videos weakly tagged into 15 classes, demonstrate automatic extraction of pixel-level object masks. Evaluated against a ground-truthed subset of 50,000 frames with pixel-level annotations, we con rm that our proposed methods can learn good object masks just by watching YouTube.

Presented at: ECCV 2012 Workshop on Web-scale Vision and Social Media, 2012, October 7-12, 2012, in Florence, ITALY.

Awarded the BEST PAPER AWARD!

 

Tags: , , , , , | Categories: Activity Recognition, Awards, Google, Matthias Grundmann, Multimedia, PAMI/ICCV/CVPR/ECCV, Papers, Vivek Kwatra, WWW | Date: October 7th, 2012 | By: Irfan Essa |

No Comments »

You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

Leave a Reply