Paper in IEEE CVPR 2013 “Geometric Context from Videos”

June 27th, 2013 Irfan Essa Posted in Matthias Grundmann, PAMI/ICCV/CVPR/ECCV, Papers, S. Hussain Raza No Comments »

  • S. H. Raza, M. Grundmann, and I. Essa (2013), “Geoemetric Context from Video,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013. [PDF] [WEBSITE] [VIDEO] [DOI] [BIBTEX]
    @InProceedings{    2013-Raza-GCFV,
      author  = {Syed Hussain Raza and Matthias Grundmann and Irfan
          Essa},
      booktitle  = {{Proceedings of IEEE Conference on Computer Vision
          and Pattern Recognition (CVPR)}},
      doi    = {10.1109/CVPR.2013.396},
      month    = {June},
      organization  = {IEEE Computer Society},
      pdf    = {http://www.cc.gatech.edu/~irfan/p/2013-Raza-GCFV.pdf},
      title    = {Geoemetric Context from Video},
      url    = {http://www.cc.gatech.edu/cpl/projects/videogeometriccontext/},
      video    = {http://www.youtube.com/watch?v=EXPmgKHPJ64},
      year    = {2013},
      bdsk-url-3  = {http://dx.doi.org/10.1109/CVPR.2013.396}
    }

Abstract

We present a novel algorithm for estimating the broad 3D geometric structure of outdoor video scenes. Leveraging spatio-temporal video segmentation, we decompose a dynamic scene captured by a video into geometric classes, based on predictions made by region-classifiers that are trained on appearance and motion features. By examining the homogeneity of the prediction, we combine predictions across multiple segmentation hierarchy levels alleviating the need to determine the granularity a priori. We built a novel, extensive dataset on geometric context of video to evaluate our method, consisting of over 100 ground-truth annotated outdoor videos with over 20,000 frames. To further scale beyond this dataset, we propose a semi-supervised learning framework to expand the pool of labeled data with high confidence predictions obtained from unlabeled data. Our system produces an accurate prediction of geometric context of video achieving 96% accuracy across main geometric classes.

via IEEE Xplore – Geometric Context from Videos.

AddThis Social Bookmark Button

Paper in IEEE CVPR 2013 “Decoding Children’s Social Behavior”

June 27th, 2013 Irfan Essa Posted in Affective Computing, Behavioral Imaging, Denis Lantsman, Gregory Abowd, James Rehg, PAMI/ICCV/CVPR/ECCV, Papers, Thomas Ploetz No Comments »

  • J. M. Rehg, G. D. Abowd, A. Rozga, M. Romero, M. A. Clements, S. Sclaroff, I. Essa, O. Y. Ousley, Y. Li, C. Kim, H. Rao, J. C. Kim, L. L. Presti, J. Zhang, D. Lantsman, J. Bidwell, and Z. Ye (2013), “Decoding Children’s Social Behavior,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013. [PDF] [WEBSITE] [DOI] [BIBTEX]
    @InProceedings{    2013-Rehg-DCSB,
      author  = {James M. Rehg and Gregory D. Abowd and Agata Rozga
          and Mario Romero and Mark A. Clements and Stan
          Sclaroff and Irfan Essa and Opal Y. Ousley and Yin
          Li and Chanho Kim and Hrishikesh Rao and Jonathan C.
          Kim and Liliana Lo Presti and Jianming Zhang and
          Denis Lantsman and Jonathan Bidwell and Zhefan Ye},
      booktitle  = {{Proceedings of IEEE Conference on Computer Vision
          and Pattern Recognition (CVPR)}},
      doi    = {10.1109/CVPR.2013.438},
      month    = {June},
      organization  = {IEEE Computer Society},
      pdf    = {http://www.cc.gatech.edu/~rehg/Papers/Rehg_CVPR13.pdf},
      title    = {Decoding Children's Social Behavior},
      url    = {http://www.cbi.gatech.edu/mmdb/},
      year    = {2013}
    }

Abstract

We introduce a new problem domain for activity recognition: the analysis of children’s social and communicative behaviors based on video and audio data. We specifically target interactions between children aged 1-2 years and an adult. Such interactions arise naturally in the diagnosis and treatment of developmental disorders such as autism. We introduce a new publicly-available dataset containing over 160 sessions of a 3-5 minute child-adult interaction. In each session, the adult examiner followed a semi-structured play interaction protocol which was designed to elicit a broad range of social behaviors. We identify the key technical challenges in analyzing these behaviors, and describe methods for decoding the interactions. We present experimental results that demonstrate the potential of the dataset to drive interesting research questions, and show preliminary results for multi-modal activity recognition.

Full database available from http://www.cbi.gatech.edu/mmdb/

via IEEE Xplore – Decoding Children’s Social Behavior.

AddThis Social Bookmark Button

Paper in IEEE CVPR 2013 “Augmenting Bag-of-Words: Data-Driven Discovery of Temporal and Structural Information for Activity Recognition”

June 27th, 2013 Irfan Essa Posted in Activity Recognition, Behavioral Imaging, Grant Schindler, PAMI/ICCV/CVPR/ECCV, Papers, Sports Visualization, Thomas Ploetz, Vinay Bettadapura No Comments »

  • V. Bettadapura, G. Schindler, T. Ploetz, and I. Essa (2013), “Augmenting Bag-of-Words: Data-Driven Discovery of Temporal and Structural Information for Activity Recognition,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013. [PDF] [WEBSITE] [DOI] [arXiv] [BIBTEX]
    @InProceedings{    2013-Bettadapura-ABDDTSIAR,
      arxiv    = {http://arxiv.org/abs/1510.02071},
      author  = {Vinay Bettadapura and Grant Schindler and Thomas
          Ploetz and Irfan Essa},
      booktitle  = {{Proceedings of IEEE Conference on Computer Vision
          and Pattern Recognition (CVPR)}},
      doi    = {10.1109/CVPR.2013.338},
      month    = {June},
      organization  = {IEEE Computer Society},
      pdf    = {http://www.cc.gatech.edu/~irfan/p/2013-Bettadapura-ABDDTSIAR.pdf},
      title    = {Augmenting Bag-of-Words: Data-Driven Discovery of
          Temporal and Structural Information for Activity
          Recognition},
      url    = {http://www.cc.gatech.edu/cpl/projects/abow/},
      year    = {2013}
    }

Abstract

We present data-driven techniques to augment Bag of Words (BoW) models, which allow for more robust modeling and recognition of complex long-term activities, especially when the structure and topology of the activities are not known a priori. Our approach specifically addresses the limitations of standard BoW approaches, which fail to represent the underlying temporal and causal information that is inherent in activity streams. In addition, we also propose the use of randomly sampled regular expressions to discover and encode patterns in activities. We demonstrate the effectiveness of our approach in experimental evaluations where we successfully recognize activities and detect anomalies in four complex datasets.

via IEEE Xplore – Augmenting Bag-of-Words: Data-Driven Discovery of Temporal and Structural Information for Activity R….

AddThis Social Bookmark Button

Paper in ECCV Workshop 2012: “Weakly Supervised Learning of Object Segmentations from Web-Scale Videos”

October 7th, 2012 Irfan Essa Posted in Activity Recognition, Awards, Google, Matthias Grundmann, Multimedia, PAMI/ICCV/CVPR/ECCV, Papers, Vivek Kwatra, WWW No Comments »

Weakly Supervised Learning of Object Segmentations from Web-Scale Videos

  • G. Hartmann, M. Grundmann, J. Hoffman, D. Tsai, V. Kwatra, O. Madani, S. Vijayanarasimhan, I. Essa, J. Rehg, and R. Sukthankar (2012), “Weakly Supervised Learning of Object Segmentations from Web-Scale Videos,” in Proceedings of ECCV 2012 Workshop on Web-scale Vision and Social Media, 2012. [PDF] [DOI] [BIBTEX]
    @InProceedings{    2012-Hartmann-WSLOSFWV,
      author  = {Glenn Hartmann and Matthias Grundmann and Judy
          Hoffman and David Tsai and Vivek Kwatra and Omid
          Madani and Sudheendra Vijayanarasimhan and Irfan
          Essa and James Rehg and Rahul Sukthankar},
      booktitle  = {Proceedings of ECCV 2012 Workshop on Web-scale
          Vision and Social Media},
      doi    = {10.1007/978-3-642-33863-2_20},
      pdf    = {http://www.cs.cmu.edu/~rahuls/pub/eccv2012wk-cp-rahuls.pdf},
      title    = {Weakly Supervised Learning of Object Segmentations
          from Web-Scale Videos},
      year    = {2012}
    }

Abstract

We propose to learn pixel-level segmentations of objects from weakly labeled (tagged) internet videos. Speci cally, given a large collection of raw YouTube content, along with potentially noisy tags, our goal is to automatically generate spatiotemporal masks for each object, such as dog”, without employing any pre-trained object detectors. We formulate this problem as learning weakly supervised classi ers for a set of independent spatio-temporal segments. The object seeds obtained using segment-level classi ers are further re ned using graphcuts to generate high-precision object masks. Our results, obtained by training on a dataset of 20,000 YouTube videos weakly tagged into 15 classes, demonstrate automatic extraction of pixel-level object masks. Evaluated against a ground-truthed subset of 50,000 frames with pixel-level annotations, we con rm that our proposed methods can learn good object masks just by watching YouTube.

Presented at: ECCV 2012 Workshop on Web-scale Vision and Social Media, 2012, October 7-12, 2012, in Florence, ITALY.

Awarded the BEST PAPER AWARD!

 

AddThis Social Bookmark Button

At CVPR 2012, in Providence, RI, June 16 – 21, 2012

June 17th, 2012 Irfan Essa Posted in Activity Recognition, Computational Photography and Video, Kihwan Kim, Matthias Grundmann, PAMI/ICCV/CVPR/ECCV, Presentations, Vivek Kwatra No Comments »

At IEEE CVPR 2012 is in Providence RI, from Jun 16 – 21, 2012.

Busy week ahead meeting good friends and colleagues. Here are some highlights of what my group is involved with.

Paper in Main Conference

  • K. Kim, D. Lee, and I. Essa (2012), “Detecting Regions of Interest in Dynamic Scenes with Camera Motions,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012. [PDF] [WEBSITE] [VIDEO] [Poster on Tuesday 6/19/2012]

Demo in Main Conference

  • M. Grundmann, V. Kwatra, D. Castro, and I. Essa (2012), “Calibration-Free Rolling Shutter Removal,” in [WEBSITE] [VIDEO] (Paper in ICCP 2012) [Demo on Monday and Tuesday (6/18-19) at the Google Booth]

Invited Talk in Workshop

AddThis Social Bookmark Button

Paper in IEEE CVPR 2012: “Detecting Regions of Interest in Dynamic Scenes with Camera Motions”

June 16th, 2012 Irfan Essa Posted in Activity Recognition, Kihwan Kim, Machine Learning, PAMI/ICCV/CVPR/ECCV, Papers, PERSEAS, Visual Surviellance No Comments »

Detecting Regions of Interest in Dynamic Scenes with Camera Motions

  • K. Kim, D. Lee, and I. Essa (2012), “Detecting Regions of Interest in Dynamic Scenes with Camera Motions,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012. [PDF] [WEBSITE] [VIDEO] [DOI] [BLOG] [BIBTEX]
    @InProceedings{    2012-Kim-DRIDSWCM,
      author  = {Kihwan Kim and Dongreyol Lee and Irfan Essa},
      blog    = {http://prof.irfanessa.com/2012/04/09/paper-cvpr2012/},
      booktitle  = {Proceedings of IEEE Conference on Computer Vision
          and Pattern Recognition (CVPR)},
      doi    = {10.1109/CVPR.2012.6247809},
      pdf    = {http://www.cc.gatech.edu/~irfan/p/2012-Kim-DRIDSWCM.pdf},
      publisher  = {IEEE Computer Society},
      title    = {Detecting Regions of Interest in Dynamic Scenes
          with Camera Motions},
      url    = {http://www.cc.gatech.edu/cpl/projects/roi/},
      video    = {http://www.youtube.com/watch?v=19BMwDMCSp8},
      year    = {2012}
    }

Abstract

We present a method to detect the regions of interests in moving camera views of dynamic scenes with multiple mov- ing objects. We start by extracting a global motion tendency that reflects the scene context by tracking movements of objects in the scene. We then use Gaussian process regression to represent the extracted motion tendency as a stochastic vector field. The generated stochastic field is robust to noise and can handle a video from an uncalibrated moving camera. We use the stochastic field for predicting important future regions of interest as the scene evolves dynamically.

We evaluate our approach on a variety of videos of team sports and compare the detected regions of interest to the camera motion generated by actual camera operators. Our experimental results demonstrate that our approach is computationally efficient, and provides better prediction than those of previously proposed RBF-based approaches.

Presented at: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2012, Providence, RI, June 16-21, 2012

AddThis Social Bookmark Button

Presentation at CVPR 2012 workshop on Large Scale Video Search and Mining “Extracting Content and Context from Video.”

June 5th, 2012 Irfan Essa Posted in Activity Recognition, PAMI/ICCV/CVPR/ECCV No Comments »

Extracting Content and Context from Video.

(Presentation at CVPR 2012 workshop on Large Scale Video Search and Mining 2012, June 21, 2012)

Irfan Essa
GEORGIA Tech
prof.irfanessa.com

Abstract

In this talk, I will describe various efforts aimed at extracting context and content from video. I will highlight some of our recent work in extracting spatio-temporal features and the related saliency information from the video, which can be used to detect and localize regions of interest in video. Then I will describe approaches that use structured and unstructured representations to recognize the complex and extended-time actions.  I will also discuss the need for unsupervised activity discovery, and detection of anomalous activities from videos. I will show a variety of examples, which will include online videos, mobile videos, surveillance and home monitoring video, and sports videos. Finally, I will pose a series of questions and make observations about how we need to extend our current paradigms of video understanding to go beyond local spatio-temporal features, and standard time-series and bag of words models.

AddThis Social Bookmark Button

Paper in ICCV 2011: “Gaussian Process Regression Flow for Analysis of Motion Trajectories”

October 28th, 2011 Irfan Essa Posted in Activity Recognition, DARPA, Kihwan Kim, PAMI/ICCV/CVPR/ECCV, Papers No Comments »

Gaussian Process Regression Flow for Analysis of Motion Trajectories

  • Kim, Lee, and Essa (2011), “Gaussian Process Regression Flow for Analysis of Motion Trajectories,” in Proceedings of IEEE International Conference on Computer Vision (ICCV), 2011. [PDF] [WEBSITE] [VIDEO] [BIBTEX]
     @inproceedings{Kim2011-GPRF, Author = {K. Kim and D. Lee and I. Essa}, Booktitle = {Proceedings of IEEE International Conference on Computer Vision (ICCV)}, Month = {November}, Pdf = {http://www.cc.gatech.edu/~irfan/p/2011-Kim-GPRFAMT.pdf}, Publisher = {IEEE Computer Society}, Title = {Gaussian Process Regression Flow for Analysis of Motion Trajectories}, Url = {http://www.cc.gatech.edu/cpl/projects/gprf/}, Video = {http://www.youtube.com/watch?v=UtLr37hDQz0}, Year = {2011}}

Abstract

Analysis and Recognition of motions and activities of objects in videos requires effective representations for analysis and matching of motion trajectories. In this paper, we introduce a new representation specifically aimed at matching motion trajectories. We model a trajectory as a continuous dense flow field from a sparse set of vector sequences using Gaussian Process Regression. Furthermore, we introduce a random sampling strategy for learning stable classes of motions from limited data.

Our representation allows for incrementally predicting possible paths and detecting anomalous events from online trajectories. This representation also supports matching of complex motions with acceleration changes and pauses or stops within a trajectory. We use the proposed approach for classifying and predicting motion trajectories in traffic monitoring domains and test on several data sets. We show that our approach works well on various types of complete and incomplete trajectories from a variety of video data sets with different frame rates

AddThis Social Bookmark Button

DEMO (2011): Auto-Directed Video Stabilization with Robust L1 Optimal Camera Paths – from Google Research Blog

June 20th, 2011 Irfan Essa Posted in Computational Photography and Video, In The News, Matthias Grundmann, Mobile Computing, PAMI/ICCV/CVPR/ECCV, Vivek Kwatra No Comments »

via Auto-Directed Video Stabilization with Robust L1 Optimal Camera Paths – Google Research Blog.

Auto-Directed Video Stabilization with Robust L1 Optimal Camera Paths
Posted by Matthias GrundmannVivek Kwatra, and Irfan Essa,

Earlier this year, we announced the launch of new features on the YouTube Video Editor, including stabilization for shaky videos, with the ability to preview them in real-time. The core technology behind this feature is detailed in this paper, which will be presented at the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR 2011).

Casually shot videos captured by handheld or mobile cameras suffer from significant amount of shake. Existing in-camera stabilization methods dampen high-frequency jitter but do not suppress low-frequency movements and bounces, such as those observed in videos captured by a walking person. On the other hand, most professionally shot videos usually consist of carefully designed camera configurations, using specialized equipment such as tripods or camera dollies, and employ ease-in and ease-out for transitions. Our goal was to devise a completely automatic method for converting casual shaky footage into more pleasant and professional looking videos.

Our technique mimics the cinematographic principles outlined above by automatically determining the best camera path using a robust optimization technique. The original, shaky camera path is divided into a set of segments, each approximated by either a constant, linear or parabolic motion. Our optimization finds the best of all possible partitions using a computationally efficient and stable algorithm.

To achieve real-time performance on the web, we distribute the computation across multiple machines in the cloud. This enables us to provide users with a real-time preview and interactive control of the stabilized result. Above we provide a video demonstration of how to use this feature on the YouTube Editor. We will also demo this live at Google’s exhibition booth in CVPR 2011.

For more details see the Project Site. See the youtube video of the system on youtube. See the paper in PDF, and a technical video of the work.

Full paper is

 

AddThis Social Bookmark Button

Paper (2011) in IEEE CVPR: “Auto-Directed Video Stabilization with Robust L1 Optimal Camera Paths”

June 19th, 2011 Irfan Essa Posted in Computational Photography and Video, Matthias Grundmann, PAMI/ICCV/CVPR/ECCV, Papers, Vivek Kwatra No Comments »

Auto-Directed Video Stabilization with Robust L1 Optimal Camera Paths

  • Grundmann, Kwatra, and Essa (2011), “Auto-Directed Video Stabilization with Robust L1 Optimal Camera Paths,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011.  [PDF] [WEBSITE][VIDEO] [DEMO][Google Research Blog] [BIBTEX]
     @inproceedings{2011-Grundmann-AVSWROCP, Author = {M. Grundmann and V. Kwatra and I. Essa}, Booktitle = {Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, Month = {June}, Pdf = {http://www.cc.gatech.edu/~irfan/p/2011-Grundmann-AVSWROCP}, Publisher = {IEEE Computer Society}, Title = {Auto-Directed Video Stabilization with Robust L1 Optimal Camera Paths}, Url = {http://www.cc.gatech.edu/cpl/projects/videostabilization/}, Video = {http://www.youtube.com/watch?v=i5keG1Y810U}, Year = {2011}}

Abstract

We present a novel algorithm for automatically applying constrainable, L1-optimal camera paths to generate stabilized videos by removing undesired motions. Our goal is to compute camera paths that are composed of constant, linear and parabolic segments mimicking the camera motions employed by professional cinematographers. To this end, our algorithm is based on a linear programming framework to minimize the first, second, and third derivatives of the resulting camera path. Our method allows for video stabilization beyond the conventional filtering of camera paths that only suppresses high frequency jitter. We incorporate additional constraints on the path of the camera directly in our algorithm, allowing for stabilized and retargeted videos. Our approach accomplishes this without the need of user interaction or costly 3D reconstruction of the scene, and works as a post-process for videos from any camera or from an online source.

AddThis Social Bookmark Button