MENU: Home Bio Affiliations Research Teaching Publications Collaborators/Students Contact FAQ ©2007-13 RSS

Paper in ECCV Workshop 2012: “Weakly Supervised Learning of Object Segmentations from Web-Scale Videos”

October 7th, 2012 Irfan Essa Posted in Activity Recognition, Awards, Google, Matthias Grundmann, Multimedia, PAMI/ICCV/CVPR/ECCV, Papers, Vivek Kwatra, WWW No Comments »

Weakly Supervised Learning of Object Segmentations from Web-Scale Videos

  • G. Hartmann, M. Grundmann, J. Hoffman, D. Tsai, V. Kwatra, O. Madani, S. Vijayanarasimhan, I. Essa, J. Rehg, and R. Sukthankar (2012), “Weakly Supervised Learning of Object Segmentations from Web-Scale Videos,” in Proceedings of ECCV 2012 Workshop on Web-scale Vision and Social Media, 2012. [PDF] [BIBTEX]
    @inproceedings{2012-Hartmann-WSLOSFWV,
      Author = {Glenn Hartmann and Matthias Grundmann and Judy Hoffman and David Tsai and Vivek Kwatra and Omid Madani and Sudheendra Vijayanarasimhan and Irfan Essa and James Rehg and Rahul Sukthankar},
      Booktitle = {Proceedings of ECCV 2012 Workshop on Web-scale Vision and Social Media},
      Date-Added = {2012-10-23 15:03:18 +0000},
      Date-Modified = {2012-10-23 15:07:04 +0000},
      Pdf = {http://www.cs.cmu.edu/~rahuls/pub/eccv2012wk-cp-rahuls.pdf},
      Title = {Weakly Supervised Learning of Object Segmentations from Web-Scale Videos},
      Year = {2012}}

Abstract

We propose to learn pixel-level segmentations of objects from weakly labeled (tagged) internet videos. Speci cally, given a large collection of raw YouTube content, along with potentially noisy tags, our goal is to automatically generate spatiotemporal masks for each object, such as dog”, without employing any pre-trained object detectors. We formulate this problem as learning weakly supervised classi ers for a set of independent spatio-temporal segments. The object seeds obtained using segment-level classi ers are further re ned using graphcuts to generate high-precision object masks. Our results, obtained by training on a dataset of 20,000 YouTube videos weakly tagged into 15 classes, demonstrate automatic extraction of pixel-level object masks. Evaluated against a ground-truthed subset of 50,000 frames with pixel-level annotations, we con rm that our proposed methods can learn good object masks just by watching YouTube.

Presented at: ECCV 2012 Workshop on Web-scale Vision and Social Media, 2012, October 7-12, 2012, in Florence, ITALY.

Awarded the BEST PAPER AWARD!

 

AddThis Social Bookmark Button

Paper in IROS 2012: “Linguistic Transfer of Human Assembly Tasks to Robots”

October 7th, 2012 Irfan Essa Posted in 0205507, Activity Recognition, IROS/ICRA, Mike Stilman, Robotics No Comments »

Linguistic Transfer of Human Assembly Tasks to Robots

  • N. Dantam, I. Essa, and M. Stilman (2012), “Linguistic Transfer of Human Assembly Tasks to Robots,” in Proceedings of Intelligent Robots and Systems (IROS), 2012. [PDF] [BIBTEX]
    @inproceedings{2012-Dantam-LTHATR,
      Author = {N. Dantam and I. Essa and M. Stilman},
      Booktitle = {Proceedings of Intelligent Robots and Systems (IROS)},
      Date-Added = {2012-10-23 15:07:46 +0000},
      Date-Modified = {2012-10-23 15:08:57 +0000},
      Pdf = {http://www.cc.gatech.edu/~ndantam3/papers/dantam2012assembly.pdf},
      Title = {Linguistic Transfer of Human Assembly Tasks to Robots},
      Year = {2012}}

Abstract

We demonstrate the automatic transfer of an assembly task from human to robot. This work extends efforts showing the utility of linguistic models in verifiable robot control policies by now performing real visual analysis of human demonstrations to automatically extract a policy for the task. This method tokenizes each human demonstration into a sequence of object connection symbols, then transforms the set of sequences from all demonstrations into an automaton, which represents the task-language for assembling a desired object. Finally, we combine this assembly automaton with a kinematic model of a robot arm to reproduce the demonstrated task.

Presented at: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2012), October 7-12, 2012 Vilamoura, Algarve, Portugal.

 

AddThis Social Bookmark Button

At CVPR 2012, in Providence, RI, June 16 – 21, 2012

June 17th, 2012 Irfan Essa Posted in Activity Recognition, Computational Photography and Video, Kihwan Kim, Matthias Grundmann, PAMI/ICCV/CVPR/ECCV, Presentations, Vivek Kwatra No Comments »

At IEEE CVPR 2012 is in Providence RI, from Jun 16 – 21, 2012.

Busy week ahead meeting good friends and colleagues. Here are some highlights of what my group is involved with.

Paper in Main Conference

  • K. Kim, D. Lee, and I. Essa (2012), “Detecting Regions of Interest in Dynamic Scenes with Camera Motions,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012. [PDF] [WEBSITE] [VIDEO] [Poster on Tuesday 6/19/2012]

Demo in Main Conference

  • M. Grundmann, V. Kwatra, D. Castro, and I. Essa (2012), “Calibration-Free Rolling Shutter Removal,” in [WEBSITE] [VIDEO] (Paper in ICCP 2012) [Demo on Monday and Tuesday (6/18-19) at the Google Booth]

Invited Talk in Workshop

AddThis Social Bookmark Button

Paper in IEEE CVPR 2012: “Detecting Regions of Interest in Dynamic Scenes with Camera Motions”

June 16th, 2012 Irfan Essa Posted in Activity Recognition, Kihwan Kim, Numerical Machine Learning, PAMI/ICCV/CVPR/ECCV, Papers, PERSEAS, Visual Surviellance No Comments »

Detecting Regions of Interest in Dynamic Scenes with Camera Motions

  • K. Kim, D. Lee, and I. Essa (2012), “Detecting Regions of Interest in Dynamic Scenes with Camera Motions,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012. [PDF] [WEBSITE] [VIDEO] [BLOG] [BIBTEX]
    @inproceedings{2012-Kim-DRIDSWCM,
      Author = {Kihwan Kim and Dongreyol Lee and Irfan Essa},
      Blog = {http://prof.irfanessa.com/2012/04/09/paper-cvpr2012/},
      Booktitle = {Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
      Date-Added = {2012-04-09 22:37:06 +0000},
      Date-Modified = {2012-04-30 22:26:13 +0000},
      Pdf = {http://www.cc.gatech.edu/~irfan/p/2012-Kim-DRIDSWCM.pdf},
      Publisher = {IEEE Computer Society},
      Title = {Detecting Regions of Interest in Dynamic Scenes with Camera Motions},
      Url = {http://www.cc.gatech.edu/cpl/projects/roi/},
      Video = {http://www.youtube.com/watch?v=19BMwDMCSp8},
      Year = {2012},
      Bdsk-Url-1 = {http://www.cc.gatech.edu/cpl/projects/roi/}}

Abstract

We present a method to detect the regions of interests in moving camera views of dynamic scenes with multiple mov- ing objects. We start by extracting a global motion tendency that reflects the scene context by tracking movements of objects in the scene. We then use Gaussian process regression to represent the extracted motion tendency as a stochastic vector field. The generated stochastic field is robust to noise and can handle a video from an uncalibrated moving camera. We use the stochastic field for predicting important future regions of interest as the scene evolves dynamically.

We evaluate our approach on a variety of videos of team sports and compare the detected regions of interest to the camera motion generated by actual camera operators. Our experimental results demonstrate that our approach is computationally efficient, and provides better prediction than those of previously proposed RBF-based approaches.

Presented at: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2012, Providence, RI, June 16-21, 2012

AddThis Social Bookmark Button

Presentation at CVPR 2012 workshop on Large Scale Video Search and Mining “Extracting Content and Context from Video.”

June 5th, 2012 Irfan Essa Posted in Activity Recognition, PAMI/ICCV/CVPR/ECCV No Comments »

Extracting Content and Context from Video.

(Presentation at CVPR 2012 workshop on Large Scale Video Search and Mining 2012, June 21, 2012)

Irfan Essa
GEORGIA Tech
prof.irfanessa.com

Abstract

In this talk, I will describe various efforts aimed at extracting context and content from video. I will highlight some of our recent work in extracting spatio-temporal features and the related saliency information from the video, which can be used to detect and localize regions of interest in video. Then I will describe approaches that use structured and unstructured representations to recognize the complex and extended-time actions.  I will also discuss the need for unsupervised activity discovery, and detection of anomalous activities from videos. I will show a variety of examples, which will include online videos, mobile videos, surveillance and home monitoring video, and sports videos. Finally, I will pose a series of questions and make observations about how we need to extend our current paradigms of video understanding to go beyond local spatio-temporal features, and standard time-series and bag of words models.

AddThis Social Bookmark Button

AT IWCV 2012: “Videos Understanding: Extracting Content and Context from Video.”

May 24th, 2012 Irfan Essa Posted in Activity Recognition, Computational Photography and Video, Presentations, Visual Surviellance No Comments »

Videos Understanding: Extracting Content and Context from Video.

(Presentation at the International Workshop on Computer Vision 2012, Ortigia, Siracusa, Sicily, May 22-24, 2012.)

Irfan Essa
GEORGIA Tech

Abstract

In this talk, I will describe various efforts aimed at extracting context and content from video. I will highlight some of our recent work in extracting spatio-temporal features and the related saliency information from the video, which can be used to detect and localize regions of interest in video. Then I will describe approaches that use structured and unstructured representations to recognize the complex and extended-time actions.  I will also discuss the need for unsupervised activity discovery, and detection of anomalous activities from videos. I will show a variety of examples, which will include online videos, mobile videos, surveillance and home monitoring video, and sports videos. Finally, I will pose a series of questions and make observations about how we need to extend our current paradigms of video understanding to go beyond local spatio-temporal features, and standard time-series and bag of words models.

AddThis Social Bookmark Button

Paper in ICCV 2011: “Gaussian Process Regression Flow for Analysis of Motion Trajectories”

October 28th, 2011 Irfan Essa Posted in Activity Recognition, DARPA, Kihwan Kim, PAMI/ICCV/CVPR/ECCV, Papers No Comments »

Gaussian Process Regression Flow for Analysis of Motion Trajectories

  • Kim, Lee, and Essa (2011), “Gaussian Process Regression Flow for Analysis of Motion Trajectories,” in Proceedings of IEEE International Conference on Computer Vision (ICCV), 2011. [PDF] [WEBSITE] [VIDEO] [BIBTEX]
     @inproceedings{Kim2011-GPRF, Author = {K. Kim and D. Lee and I. Essa}, Booktitle = {Proceedings of IEEE International Conference on Computer Vision (ICCV)}, Month = {November}, Pdf = {http://www.cc.gatech.edu/~irfan/p/2011-Kim-GPRFAMT.pdf}, Publisher = {IEEE Computer Society}, Title = {Gaussian Process Regression Flow for Analysis of Motion Trajectories}, Url = {http://www.cc.gatech.edu/cpl/projects/gprf/}, Video = {http://www.youtube.com/watch?v=UtLr37hDQz0}, Year = {2011}}

Abstract

Analysis and Recognition of motions and activities of objects in videos requires effective representations for analysis and matching of motion trajectories. In this paper, we introduce a new representation specifically aimed at matching motion trajectories. We model a trajectory as a continuous dense flow field from a sparse set of vector sequences using Gaussian Process Regression. Furthermore, we introduce a random sampling strategy for learning stable classes of motions from limited data.

Our representation allows for incrementally predicting possible paths and detecting anomalous events from online trajectories. This representation also supports matching of complex motions with acceleration changes and pauses or stops within a trajectory. We use the proposed approach for classifying and predicting motion trajectories in traffic monitoring domains and test on several data sets. We show that our approach works well on various types of complete and incomplete trajectories from a variety of video data sets with different frame rates

AddThis Social Bookmark Button

Funding (2011) NSF (1146352) “EAGER: Linguistic Task Transfer for Humans and Cyber Systems”

September 1st, 2011 Irfan Essa Posted in Activity Recognition, Mike Stilman, NSF, Robotics No Comments »

EAGER: Linguistic Task Transfer for Humans and Cyber Systems (Mike Stillman, Irfan Essa) NSF/RI

This project, investigating formal languages as a general methodology for task transfer between distinct cyber-physical systems such as humans and robots, aims to expand the science of cyber physical systems by developing Motion Grammars that will enable task transfer between distinct systems.

Formal languages are tools for encoding, describing and transferring structured knowledge. In natural language, the latter process is called communication. Similarly, we will develop a formal language through which arbitrary cyber-physical systems communicate tasks via structured actions. This investigation of Motion Grammars will contribute to the science of human cognition and the engineering of cyber-physical algorithms. By observing human activities during manipulation we will develop a novel class of hybrid control algorithms based on linguistic representations of task execution. These algorithms will broaden the capabilities of man-made systems and provide the infrastructure for motion transfer between humans, robots and broader systems in a generic context. Furthermore, the representation in a rigorous grammatical context will enable formal verification and validation in future work.
Broader Impacts: The proposed research has direct applications to new solutions for manufacturing, medical treatments such as surgery, logistics and food processing. In turn, each of these areas has a significant impact on the efficiency and convenience of our daily lives. The PIs serve as coordinators of graduate/undergraduate programs and mentors to community schools. In order to guarantee that women and minorities have a significant role in the research, the PIs will annually invite K-12 students from Atlanta schools with primarily African American populations to the laboratories. One-day robot classes will be conducted that engage students in the excitement of hands-on science by interactively using lab equipment to transfer their manipulation skills to a robot arm.

Via Award#1146352 – EAGER: Linguistic Task Transfer for Humans and Cyber Systems.

AddThis Social Bookmark Button

Presentation (2011) at IBPRIA 2011: “Spatio-Temporal Video Analysis and Visual Activity Recognition”

June 8th, 2011 Irfan Essa Posted in Activity Recognition, Computational Photography and Video, Kihwan Kim, Matthias Grundmann, Multimedia, PAMI/ICCV/CVPR/ECCV, Presentations No Comments »

“Spatio-Temporal Video Analysis and Visual Activity Recognition” at the Iberian Conference on Pattern Recognition and Image Analysis  (IbPRIA) 2011 Conference in Las Palmas de Gran Canaria. Spain. June 8-10.

Abstract

My research group is focused on a variety of approaches for (a) low-level video analysis and synthesis and (b) recognizing activities in videos. In this talk, I will concentrate on two of our recent efforts. One effort aimed at robust spatio-temporal segmentation of video and another on using motion and flow to recognize and predict actions from video.

In the first part of the talk, I will present an efficient and scalable technique for spatio-temporal segmentation of long video sequences using a hierarchical graph-based algorithm. In this work, we begin by over segmenting a volumetric video graph into space-time regions grouped by appearance. We then construct a “region graph” over the obtained segmentation and iteratively repeat this process over multiple levels to create a tree of spatio-temporal segmentations. This hierarchical approach generates high quality segmentations, which are temporally coherent with stable region boundaries, and allows subsequent applications to choose from varying levels of granularity. We further improve segmentation quality by using dense optical flow to guide temporal connections in the initial graph. I will demonstrate a variety of examples of how this robust segmentation works, and will show additional examples of video-retargeting that use spatio-temporal saliency derived from this segmentation approach. (Matthias Grundmann, Vivek Kwatra, Mei Han, Irfan Essa, CVPR 2010, in collaboration with Google Research).

In the second part of this talk, I will show that constrained multi-agent events can be analyzed and even predicted from video. Such analysis requires estimating the global movements of all players in the scene at any time, and is needed for modeling and predicting how the multi-agent play evolves over time on the playing field. To this end, we propose a novel approach to detect the locations of where the play evolution will proceed, e.g. where interesting events will occur, by tracking player positions and movements over time. To achieve this, we extract the ground level sparse movement of players in each time-step, and then generate a dense motion field. Using this field we detect locations where the motion converges, implying positions towards which the play is evolving. I will show examples of how we have tested this approach for soccer, basketball and hockey. (Kihwan Kim, Matthias Grundmann, Ariel Shamir, Iain Matthews, Jessica Hodgins, Irfan Essa, CVPR 2010, in collaboration with Disney Research).

Time permitting, I will show some more videos of our recent work on video analysis and synthesis. For more information, papers, and videos, see my website.

AddThis Social Bookmark Button

In the News (2010): DARPA Awards Kitware a $13.8 Million Contract for Online Threat Detection and Forensic Analysis in Wide-Area Motion Imagery

September 2nd, 2010 Irfan Essa Posted in Activity Recognition, Grant Schindler, PERSEAS, Visual Surviellance No Comments »

via Kitware – News: DARPA Awards Kitware a $13.8 Million Contract for Online Threat Detection and Forensic Analysis in Wide-Area Motion Imagery.

Kitware has received a $13,883,314 contract from Defense Advanced Research Projects Agency (DARPA) to develop a software system capable of automatically and interactively discovering actionable intelligence from wide area motion imagery (WAMI) of complex urban, suburban, and rural environments.

The primary information elements in WAMI data are moving entities in the context of roads, buildings, and other scene features. These entities, while exploitable, often yield fragmented tracks in complex urban environments due to occlusions, stops, and other factors. Kitware’s software system will use algorithmic solutions to associate tracks and then identify and integrate local events to detect potential threats and perform forensic analysis.

The developed algorithms will form the basis of a software prototype called the Persistent Stare Exploitation and Analysis System (PerSEAS) that will significantly augment an end-user’s ability to discover novel intelligence using models of activities, normalcy, and context. Since the vast majority of events are normal and pose no threat, the models must cross-integrate singular events to discover relationships and anomalies that are indicative of suspicious behavior or match previously learned – or defined – threat activity.

The advanced PerSEAS system will markedly improve an analyst’s ability to handle burgeoning WAMI data and reduce the time required to perform many current exploitation tasks, greatly enhancing the military’s capability to analyze and utilize the data for forensic analysis and through the issuance of timely threat alerts with a minimal number of false alarms.

Due to the complex, multi-disciplinary nature of the research, Kitware will partner with academic experts in the fields of computer vision, probabilistic reasoning, machine learning and other related domains. Phase I of the research is expected to be completed in two years.

The awarded contract will expand Kitware’s leadership in the field of computer vision, video analysis and advanced visualization software. The project will build upon our previous DARPA-sponsored research into content-based video retrieval on the VIRAT program; anomaly detection on the PANDA program; and the recognition of complex multi-agent activities in video.

To meet the PerSEAS program’s needs, Kitware has assembled a world-class team including four leading defense technology companies, Northrop Grumman Corporation, ; Honeywell Automation and Control Solutions Laboratories, Aptima, Inc., and Navia, Inc. As well as multiple internationally-renowned research institutions, including: the University of California, Berkeley; Computer Vision Laboratory, University of Maryland; Rensselaer Polytechnic Institute; the Computer Vision Lab at the University of Central Florida; the School of Interactive Computing at Georgia Tech and its affiliated Center for Robotics & Intelligent Machines; and Columbia University.

 

AddThis Social Bookmark Button