MENU: Home Bio Affiliations Research Teaching Publications Collaborators/Students Calendar Contact FAQ ©2007-12 RSS

Paper (2011) in IEEE CVPR: “Auto-Directed Video Stabilization with Robust L1 Optimal Camera Paths”

June 19th, 2011 Irfan Essa Posted in Computational Photography and Video, Matthias Grundmann, PAMI/ICCV/CVPR/ECCV, Papers, Vivek Kwatra No Comments »

Auto-Directed Video Stabilization with Robust L1 Optimal Camera Paths

  • Grundmann, Kwatra, and Essa (2011), “Auto-Directed Video Stabilization with Robust L1 Optimal Camera Paths,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011.  [PDF] [WEBSITE][VIDEO] [DEMO][Google Research Blog] [BIBTEX]
     @inproceedings{2011-Grundmann-AVSWROCP, Author = {M. Grundmann and V. Kwatra and I. Essa}, Booktitle = {Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, Month = {June}, Pdf = {http://www.cc.gatech.edu/~irfan/p/2011-Grundmann-AVSWROCP}, Publisher = {IEEE Computer Society}, Title = {Auto-Directed Video Stabilization with Robust L1 Optimal Camera Paths}, Url = {http://www.cc.gatech.edu/cpl/projects/videostabilization/}, Video = {http://www.youtube.com/watch?v=i5keG1Y810U}, Year = {2011}}

Abstract

We present a novel algorithm for automatically applying constrainable, L1-optimal camera paths to generate stabilized videos by removing undesired motions. Our goal is to compute camera paths that are composed of constant, linear and parabolic segments mimicking the camera motions employed by professional cinematographers. To this end, our algorithm is based on a linear programming framework to minimize the first, second, and third derivatives of the resulting camera path. Our method allows for video stabilization beyond the conventional filtering of camera paths that only suppresses high frequency jitter. We incorporate additional constraints on the path of the camera directly in our algorithm, allowing for stabilized and retargeted videos. Our approach accomplishes this without the need of user interaction or costly 3D reconstruction of the scene, and works as a post-process for videos from any camera or from an online source.

AddThis Social Bookmark Button

Presentation (2011) at IBPRIA 2011: “Spatio-Temporal Video Analysis and Visual Activity Recognition”

June 8th, 2011 Irfan Essa Posted in Activity Recognition, Computational Photography and Video, Kihwan Kim, Matthias Grundmann, Multimedia, PAMI/ICCV/CVPR/ECCV, Presentations No Comments »

“Spatio-Temporal Video Analysis and Visual Activity Recognition” at the Iberian Conference on Pattern Recognition and Image Analysis  (IbPRIA) 2011 Conference in Las Palmas de Gran Canaria. Spain. June 8-10.

Abstract

My research group is focused on a variety of approaches for (a) low-level video analysis and synthesis and (b) recognizing activities in videos. In this talk, I will concentrate on two of our recent efforts. One effort aimed at robust spatio-temporal segmentation of video and another on using motion and flow to recognize and predict actions from video.

In the first part of the talk, I will present an efficient and scalable technique for spatio-temporal segmentation of long video sequences using a hierarchical graph-based algorithm. In this work, we begin by over segmenting a volumetric video graph into space-time regions grouped by appearance. We then construct a “region graph” over the obtained segmentation and iteratively repeat this process over multiple levels to create a tree of spatio-temporal segmentations. This hierarchical approach generates high quality segmentations, which are temporally coherent with stable region boundaries, and allows subsequent applications to choose from varying levels of granularity. We further improve segmentation quality by using dense optical flow to guide temporal connections in the initial graph. I will demonstrate a variety of examples of how this robust segmentation works, and will show additional examples of video-retargeting that use spatio-temporal saliency derived from this segmentation approach. (Matthias Grundmann, Vivek Kwatra, Mei Han, Irfan Essa, CVPR 2010, in collaboration with Google Research).

In the second part of this talk, I will show that constrained multi-agent events can be analyzed and even predicted from video. Such analysis requires estimating the global movements of all players in the scene at any time, and is needed for modeling and predicting how the multi-agent play evolves over time on the playing field. To this end, we propose a novel approach to detect the locations of where the play evolution will proceed, e.g. where interesting events will occur, by tracking player positions and movements over time. To achieve this, we extract the ground level sparse movement of players in each time-step, and then generate a dense motion field. Using this field we detect locations where the motion converges, implying positions towards which the play is evolving. I will show examples of how we have tested this approach for soccer, basketball and hockey. (Kihwan Kim, Matthias Grundmann, Ariel Shamir, Iain Matthews, Jessica Hodgins, Irfan Essa, CVPR 2010, in collaboration with Disney Research).

Time permitting, I will show some more videos of our recent work on video analysis and synthesis. For more information, papers, and videos, see my website.

AddThis Social Bookmark Button

PhD Fellowships from Google Research for Matthias Grundmann

May 16th, 2011 Irfan Essa Posted in Awards, In The News, Matthias Grundmann No Comments »

Congratulations to Matthias Grundmann, winner of the Google PhD Fellowship in Computer Vision for 2012.

via PhD Fellowships – Google Research.

Google PhD Fellowship Program Overview

Nurturing and maintaining strong relations with the academic community is a top priority at Google. The Google U.S./Canada PhD Student Fellowship Program was created to recognize outstanding graduate students doing exceptional work in computer science, related disciplines, or promising research areas. Last year we awarded 14 unique fellowships to some amazing students in the US and Canada:

  • Matthias Grundmann, Google U.S./Canada Fellowship in Computer Vision (Georgia Institute of Technology)
AddThis Social Bookmark Button

Going Live on YouTube (2011): Lights, Camera… EDIT! New Features for the YouTube Video Editor

March 21st, 2011 Irfan Essa Posted in Computational Photography and Video, Google, In The News, Matthias Grundmann, Multimedia, Vivek Kwatra, WWW No Comments »

via YouTube Blog: Lights, Camera… EDIT! New Features for the YouTube Video Editor.

  • M. Grundmann, V. Kwatra, and I. Essa (2011), “Auto-Directed Video Stabilization with Robust L1 Optimal Camera Paths,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011. [PDF] [WEBSITE] [VIDEO] [DEMO] [BLOG] [BIBTEX]
    @inproceedings{2011-Grundmann-AVSWROCP,
      Author = {M. Grundmann and V. Kwatra and I. Essa},
      Blog = {http://prof.irfanessa.com/2011/06/19/videostabilization/},
      Booktitle = {Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
      Date-Modified = {2011-12-08 22:13:20 +0000},
      Demo = {http://www.youtube.com/watch?v=0MiY-PNy-GU},
      Month = {June},
      Pdf = {http://www.cc.gatech.edu/~irfan/p/2011-Grundmann-AVSWROCP.pdf},
      Publisher = {IEEE Computer Society},
      Title = {Auto-Directed Video Stabilization with Robust L1 Optimal Camera Paths},
      Url = {http://www.cc.gatech.edu/cpl/projects/videostabilization/},
      Video = {http://www.youtube.com/watch?v=i5keG1Y810U},
      Year = {2011},
      Bdsk-Url-1 = {http://www.cc.gatech.edu/cpl/projects/videostabilization/}}

Lights, Camera… EDIT! New Features for the YouTube Video Editor

Nine months ago we launched our cloud-based video editor. It was a simple product built to provide our users with simple editing tools. Although it didn’t have all the features available on paid desktop editing software, the idea was that the vast majority of people’s video editing needs are pretty basic and straight-forward and we could provide these features with a free editor available on the Web. Since launch, hundreds of thousands of videos have been published using the YouTube Video Editor and we’ve regularly pushed out new feature enhancements to the product, including:

  • Video transitions (crossfade, wipe, slide)
  • The ability to save projects across sessions
  • Increased clips allowed in the editor from 6 to 17
  • Video rotation (from portrait to landscape and vice versa – great for videos shot on mobile)
  • Shape transitions (heart, star, diamond, and Jack-O-Lantern for Halloween)
  • Audio mixing (AudioSwap track mixed with original audio)
  • Effects (brightness/contrast, black & white)

A new user interface and project menu for multiple saved projects

While many of these are familiar features also available on desktop software, today, we’re excited to unveil two new features that the team has been working on over the last couple of months that take unique advantage of the cloud:

Stabilizer

Ever shoot a shaky video that’s so jittery, it’s actually hard to watch? Professional cinematographers use stabilization equipment such as tripods or camera dollies to keep their shots smooth and steady. Our team mimicked these cinematographic principles by automatically determining the best camera path for you through a unified optimization technique. In plain English, you can smooth some of those unsteady videos with the click of a button. We also wanted you to be able to preview these results in real-time, before publishing the finished product to the Web. We can do this by harnessing the power of the cloud by splitting the computation required for stabilizing the video into chunks and distributed them across different servers. This allows us to use the power of many machines in parallel, computing and streaming the stabilized results quickly into the preview. You can check out the paper we’re publishing entitled “Auto-Directed Video Stabilization with Robust L1 Optimal Camera Paths.” Want to see stabilizer in action? You can test it out for yourself, or check out these two videos. The first is without stabilizer.

And now, with the stabilizer:

AddThis Social Bookmark Button

Funding (2011): NSF (1059362): “II-New: Motion Grammar Laboratory”

March 1st, 2011 Irfan Essa Posted in Henrik Christensen, Mike Stilman, NSF No Comments »

II-New: Motion Grammar Laboratory (Stillman, Essa, Egerstadt, Christensen, Ueda) Division of Computer and Network Systems Instrumentation Grant.

An anthropomorphic robot arm and a human capture system enable the autonomous performance of assembly tasks with significant uncertainty in problem specifications and environments. This line of work is investigated through sequences of manipulation actions where the guarantee of the completion of task-level objectives is rooted in the discovery of the semantic structure of human manipulation. New research directions in anthropomorphic robotics are explored including programming by demonstration, activity recognition, control and estimation and planning.

The motion grammar laboratory infrastructure allows a great opportunity for research and education. New classroom experiences for undergraduates and graduates provide practical experience in robot human interaction and activity process sharing. This opens possibilities for human training and rehabilitation, as well as assistive personal robotic, and opens the door to a host of technological innovations.

via Award#1059362 – II-New: Motion Grammar Laboratory.

AddThis Social Bookmark Button

Paper (2011) in Virtual Reality: “Augmenting aerial earth maps with dynamic information from videos”

February 2nd, 2011 Irfan Essa Posted in Computational Photography and Video, Kihwan Kim, Papers, Sangmin Oh No Comments »

Augmenting aerial earth maps with dynamic information from videos

  • Kim, Oh, Lee, and Essa (2011), “Augmenting aerial earth maps with dynamic information from videos,” Journal of Virtual Reality, Special Issue on Augmented Reality, vol. 15, iss. 2-3, pp. 1359-4338, 2011.  [PDF] [WEBSITE] [VIDEO] [DOI] [SpringerLink][BIBTEX]
    
    @article{2011-Kim-AAEMWDIFV,
     Author = {K. Kim and S. Oh and J. Lee and I. Essa},
     Doi = {10.1007/s10055-010-0186-2},
     Journal = {Journal of Virtual Reality, Special Issue on Augmented Reality},
     Number = {2-3},
     Pages = {1359-4338},
     Pdf = {http://www.cc.gatech.edu/~irfan/p/2011-Kim-AAEMWDIFV.pdf},
     Title = {Augmenting aerial earth maps with dynamic information from videos},
     Url = {http://www.cc.gatech.edu/cpl/projects/augearth},
     Video = {http://www.youtube.com/watch?v=TPk88soc2qw},
     Volume = {15},
     Year = {2011}}

Abstract

We introduce methods for augmenting aerial visualizations of Earth (from tools such as Google Earth or Microsoft Virtual Earth) with dynamic information obtained from videos. Our goal is to make Augmented Earth Maps that visualize plausible live views of dynamic scenes in a city. We propose different approaches to analyze videos of pedestrians and cars in real situations, under differing conditions to extract dynamic information. Then, we augment an Aerial Earth Maps (AEMs) with the extracted live and dynamic content. We also analyze natural phenomenon (skies, clouds) and project information from these to the AEMs to add to the visual reality. Our primary contributions are: (1) Analyzing videos with different viewpoints, coverage, and overlaps to extract relevant information about view geometry and movements, with limited user input. (2) Projecting this information appropriately to the viewpoint of the AEMs and modeling the dynamics in the scene from observations to allow inference (in case of missing data) and synthesis. We demonstrate this over a variety of camera configurations and conditions. (3) The modeled information from videos is registered to the AEMs to render appropriate movements and related dynamics. We demonstrate this with traffic flow, people movements, and cloud motions. All of these approaches are brought together as a prototype system for a real-time visualization of a city that is alive and engaging.

Augmented Earth

AddThis Social Bookmark Button

Poster STS 2011: “3-Dimensional Visualization of the Operating Room Using Advanced Motion Capture: A Novel Paradigm to Expand Simulation-Based Surgical Education”

February 2nd, 2011 Irfan Essa Posted in Computational Photography and Video, Eric Sarin, Health Systems, Kihwan Kim, Papers, Uncategorized, William Cooper No Comments »

3-Dimensional Visualization of the Operating Room Using Advanced Motion Capture: A Novel Paradigm to Expand Simulation-Based Surgical Education

  • Sarin, Kim, Essa, and Cooper (2011), “3-Dimensional Visualization of the Operating Room Using Advanced Motion Capture: A Novel Paradigm to Expand Simulation-Based Surgical Education,” in Proccedings of Society of Thoracic Surgeons Annual Meeting, Society of Thoracic Surgeons, 2011.  [BLOG][BIBTEX]
    
    @incollection{2011-Sarin-3VORUAMCNPESSE,
      Author = {E. L. Sarin and K. Kim and I. Essa and W. A. Cooper},
      Blog = {http://prof.irfanessa.com/2011/02/02/sts-2011/},
      Booktitle = {Proccedings of Society of Thoracic Surgeons Annual Meeting},
      Month = {January},
      Publisher = {Society of Thoracic Surgeons},
      Title = {3-Dimensional Visualization of the Operating Room Using Advanced Motion Capture: A Novel Paradigm to Expand Simulation-Based Surgical Education},
      Type = {Poster and Video Presentation},
      Year = {2011}}

A collaborative project between School of Interactive Computing, Georgia Institute of Technology, Atlanta, Georgia, Division of Cardiothoracic Surgery, Emory University School of Medicine, Atlanta, Georgia, and Inova Heart and Vascular Institute1, Fairfax, Virginia. This was a Video and a Poster presentation at the Society of Thoracic Surgeons Annual Meeting in San Diego, CA, Jan 2011.

Poster for Society of Thoracic Surgeon's Annual Meeting

AddThis Social Bookmark Button

Paper (2011) in IEEE PAMI: “Bilayer Segmentation of Webcam Videos Using Tree-Based Classifiers “

January 12th, 2011 Irfan Essa Posted in Antonio Crimisini, Computational Photography and Video, John Winn, Numerical Machine Learning, PAMI/ICCV/CVPR/ECCV, Papers, Pei Yin No Comments »

Bilayer Segmentation of Webcam Videos Using Tree-Based Classifiers

Pei Yin, A. Criminisi, J. Winn, I. Essa (2011), “Bilayer Segmentation of Webcam Videos Using Tree-Based Classifiers” in Pattern Analysis and Machine Intelligence, IEEE Transactions on, Jan. 2011, Volume :  33 ,  Issue:1, ISSN :  0162-8828, Digital Object Identifier :  10.1109/TPAMI.2010.65,  IEEE Computer Society [Project Page|DOI]

ABSTRACT

This paper presents an automatic segmentation algorithm for video frames captured by a (monocular) webcam that closely approximates depth segmentation from a stereo camera. The frames are segmented into foreground and background layers that comprise a subject (participant) and other objects and individuals. The algorithm produces correct segmentations even in the presence of large background motion with a nearly stationary foreground. This research makes three key contributions: First, we introduce a novel motion representation, referred to as “motons,” inspired by research in object recognition. Second, we propose estimating the segmentation likelihood from the spatial context of motion. The estimation is efficiently learned by random forests. Third, we introduce a general taxonomy of tree-based classifiers that facilitates both theoretical and experimental comparisons of several known classification algorithms and generates new ones. In our bilayer segmentation algorithm, diverse visual cues such as motion, motion context, color, contrast, and spatial priors are fused by means of a conditional random field (CRF) model. Segmentation is then achieved by binary min-cut. Experiments on many sequences of our videochat application demonstrate that our algorithm, which requires no initialization, is effective in a variety of scenes, and the segmentation results are comparable to those obtained by stereo systems.

via IEEE Xplore – Abstract Page.

AddThis Social Bookmark Button

In the News (2010): DARPA Awards Kitware a $13.8 Million Contract for Online Threat Detection and Forensic Analysis in Wide-Area Motion Imagery

September 2nd, 2010 Irfan Essa Posted in Activity Recognition, Grant Schindler, PERSEAS, Visual Surviellance No Comments »

via Kitware – News: DARPA Awards Kitware a $13.8 Million Contract for Online Threat Detection and Forensic Analysis in Wide-Area Motion Imagery.

Kitware has received a $13,883,314 contract from Defense Advanced Research Projects Agency (DARPA) to develop a software system capable of automatically and interactively discovering actionable intelligence from wide area motion imagery (WAMI) of complex urban, suburban, and rural environments.

The primary information elements in WAMI data are moving entities in the context of roads, buildings, and other scene features. These entities, while exploitable, often yield fragmented tracks in complex urban environments due to occlusions, stops, and other factors. Kitware’s software system will use algorithmic solutions to associate tracks and then identify and integrate local events to detect potential threats and perform forensic analysis.

The developed algorithms will form the basis of a software prototype called the Persistent Stare Exploitation and Analysis System (PerSEAS) that will significantly augment an end-user’s ability to discover novel intelligence using models of activities, normalcy, and context. Since the vast majority of events are normal and pose no threat, the models must cross-integrate singular events to discover relationships and anomalies that are indicative of suspicious behavior or match previously learned – or defined – threat activity.

The advanced PerSEAS system will markedly improve an analyst’s ability to handle burgeoning WAMI data and reduce the time required to perform many current exploitation tasks, greatly enhancing the military’s capability to analyze and utilize the data for forensic analysis and through the issuance of timely threat alerts with a minimal number of false alarms.

Due to the complex, multi-disciplinary nature of the research, Kitware will partner with academic experts in the fields of computer vision, probabilistic reasoning, machine learning and other related domains. Phase I of the research is expected to be completed in two years.

The awarded contract will expand Kitware’s leadership in the field of computer vision, video analysis and advanced visualization software. The project will build upon our previous DARPA-sponsored research into content-based video retrieval on the VIRAT program; anomaly detection on the PANDA program; and the recognition of complex multi-agent activities in video.

To meet the PerSEAS program’s needs, Kitware has assembled a world-class team including four leading defense technology companies, Northrop Grumman Corporation, ; Honeywell Automation and Control Solutions Laboratories, Aptima, Inc., and Navia, Inc. As well as multiple internationally-renowned research institutions, including: the University of California, Berkeley; Computer Vision Laboratory, University of Maryland; Rensselaer Polytechnic Institute; the Computer Vision Lab at the University of Central Florida; the School of Interactive Computing at Georgia Tech and its affiliated Center for Robotics & Intelligent Machines; and Columbia University.

 

AddThis Social Bookmark Button

Paper in CVPR (2010): “Motion Field to Predict Play Evolution in Dynamic Sport Scenes

June 13th, 2010 Irfan Essa Posted in Activity Recognition, Jessica Hodgins, Kihwan Kim, Matthias Grundmann, PAMI/ICCV/CVPR/ECCV, Papers, Sports Visualization No Comments »

Kihwan Kim, Matthias Grundmann, Ariel Shamir, Iain Matthews, Jessica Hodgins, Irfan Essa (2010) “Motion Field to Predict Play Evolution in Dynamic Sport Scenes” in Proceedings of IEEE Computer Vision and Pattern Recognition Conference (CVPR), San Francisco, CA, USA, June 2010 [PDF][Website][DOI][Video (Youtube)].

Abstract

Videos of multi-player team sports provide a challenging domain for dynamic scene analysis. Player actions and interactions are complex as they are driven by many factors, such as the short-term goals of the individual player, the overall team strategy, the rules of the sport, and the current context of the game. We show that constrained multi-agent events can be analyzed and even predicted from video. Such analysis requires estimating the global movements of all players in the scene at any time, and is needed for modeling and predicting how the multi-agent play evolves over time on the field. To this end, we propose a novel approach to detect the locations of where the play evolution will proceed, e.g. where interesting events will occur, by tracking player positions and movements over time. We start by extracting the ground level sparse movement of players in each time-step, and then generate a dense motion field. Using this field we detect locations where the motion converges, implying positions towards which the play is evolving. We evaluate our approach by analyzing videos of a variety of complex soccer plays.

CVPR 2010 Paper on Play Evolution

AddThis Social Bookmark Button