MENU: Home Bio Affiliations Research Teaching Publications Collaborators/Students Calendar Contact FAQ ©2007-12 RSS

Paper in IEEE CVPR 2012: “Detecting Regions of Interest in Dynamic Scenes with Camera Motions”

April 9th, 2012 Irfan Essa Posted in Activity Recognition, Kihwan Kim, Numerical Machine Learning, PAMI/ICCV/CVPR/ECCV, Papers, PERSEAS, Visual Surviellance No Comments »

Detecting Regions of Interest in Dynamic Scenes with Camera Motions

  • K. Kim, D. Lee, and I. Essa (2012), “Detecting Regions of Interest in Dynamic Scenes with Camera Motions,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012. [PDF] [WEBSITE] [VIDEO] [BLOG] [BIBTEX]
    @inproceedings{2012-Kim-DRIDSWCM,
      Author = {Kihwan Kim and Dongreyol Lee and Irfan Essa},
      Blog = {http://prof.irfanessa.com/2012/04/09/paper-cvpr2012/},
      Booktitle = {Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
      Date-Added = {2012-04-09 22:37:06 +0000},
      Date-Modified = {2012-04-30 22:26:13 +0000},
      Pdf = {http://www.cc.gatech.edu/~irfan/p/2012-Kim-DRIDSWCM.pdf},
      Publisher = {IEEE Computer Society},
      Title = {Detecting Regions of Interest in Dynamic Scenes with Camera Motions},
      Url = {http://www.cc.gatech.edu/cpl/projects/roi/},
      Video = {http://www.youtube.com/watch?v=19BMwDMCSp8},
      Year = {2012},
      Bdsk-Url-1 = {http://www.cc.gatech.edu/cpl/projects/roi/}}

Abstract

We present a method to detect the regions of interests in moving camera views of dynamic scenes with multiple mov- ing objects. We start by extracting a global motion tendency that reflects the scene context by tracking movements of objects in the scene. We then use Gaussian process regression to represent the extracted motion tendency as a stochastic vector field. The generated stochastic field is robust to noise and can handle a video from an uncalibrated moving camera. We use the stochastic field for predicting important future regions of interest as the scene evolves dynamically.

We evaluate our approach on a variety of videos of team sports and compare the detected regions of interest to the camera motion generated by actual camera operators. Our experimental results demonstrate that our approach is computationally efficient, and provides better prediction than those of previously proposed RBF-based approaches.

Presented at: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2012, Providence, RI, June 16-21, 2012

AddThis Social Bookmark Button

Paper in ICCV 2011: “Gaussian Process Regression Flow for Analysis of Motion Trajectories”

October 28th, 2011 Irfan Essa Posted in Activity Recognition, DARPA, Kihwan Kim, PAMI/ICCV/CVPR/ECCV, Papers No Comments »

Gaussian Process Regression Flow for Analysis of Motion Trajectories

  • Kim, Lee, and Essa (2011), “Gaussian Process Regression Flow for Analysis of Motion Trajectories,” in Proceedings of IEEE International Conference on Computer Vision (ICCV), 2011. [PDF] [WEBSITE] [VIDEO] [BIBTEX]
     @inproceedings{Kim2011-GPRF, Author = {K. Kim and D. Lee and I. Essa}, Booktitle = {Proceedings of IEEE International Conference on Computer Vision (ICCV)}, Month = {November}, Pdf = {http://www.cc.gatech.edu/~irfan/p/2011-Kim-GPRFAMT.pdf}, Publisher = {IEEE Computer Society}, Title = {Gaussian Process Regression Flow for Analysis of Motion Trajectories}, Url = {http://www.cc.gatech.edu/cpl/projects/gprf/}, Video = {http://www.youtube.com/watch?v=UtLr37hDQz0}, Year = {2011}}

Abstract

Analysis and Recognition of motions and activities of objects in videos requires effective representations for analysis and matching of motion trajectories. In this paper, we introduce a new representation specifically aimed at matching motion trajectories. We model a trajectory as a continuous dense flow field from a sparse set of vector sequences using Gaussian Process Regression. Furthermore, we introduce a random sampling strategy for learning stable classes of motions from limited data.

Our representation allows for incrementally predicting possible paths and detecting anomalous events from online trajectories. This representation also supports matching of complex motions with acceleration changes and pauses or stops within a trajectory. We use the proposed approach for classifying and predicting motion trajectories in traffic monitoring domains and test on several data sets. We show that our approach works well on various types of complete and incomplete trajectories from a variety of video data sets with different frame rates

AddThis Social Bookmark Button

Funding (2011) NSF (1146352) “EAGER: Linguistic Task Transfer for Humans and Cyber Systems”

September 1st, 2011 Irfan Essa Posted in Activity Recognition, Mike Stilman, NSF, Robotics No Comments »

EAGER: Linguistic Task Transfer for Humans and Cyber Systems (Mike Stillman, Irfan Essa) NSF/RI

This project, investigating formal languages as a general methodology for task transfer between distinct cyber-physical systems such as humans and robots, aims to expand the science of cyber physical systems by developing Motion Grammars that will enable task transfer between distinct systems.

Formal languages are tools for encoding, describing and transferring structured knowledge. In natural language, the latter process is called communication. Similarly, we will develop a formal language through which arbitrary cyber-physical systems communicate tasks via structured actions. This investigation of Motion Grammars will contribute to the science of human cognition and the engineering of cyber-physical algorithms. By observing human activities during manipulation we will develop a novel class of hybrid control algorithms based on linguistic representations of task execution. These algorithms will broaden the capabilities of man-made systems and provide the infrastructure for motion transfer between humans, robots and broader systems in a generic context. Furthermore, the representation in a rigorous grammatical context will enable formal verification and validation in future work.
Broader Impacts: The proposed research has direct applications to new solutions for manufacturing, medical treatments such as surgery, logistics and food processing. In turn, each of these areas has a significant impact on the efficiency and convenience of our daily lives. The PIs serve as coordinators of graduate/undergraduate programs and mentors to community schools. In order to guarantee that women and minorities have a significant role in the research, the PIs will annually invite K-12 students from Atlanta schools with primarily African American populations to the laboratories. One-day robot classes will be conducted that engage students in the excitement of hands-on science by interactively using lab equipment to transfer their manipulation skills to a robot arm.

Via Award#1146352 – EAGER: Linguistic Task Transfer for Humans and Cyber Systems.

AddThis Social Bookmark Button

Presentation (2011) at IBPRIA 2011: “Spatio-Temporal Video Analysis and Visual Activity Recognition”

June 8th, 2011 Irfan Essa Posted in Activity Recognition, Computational Photography and Video, Kihwan Kim, Matthias Grundmann, Multimedia, PAMI/ICCV/CVPR/ECCV, Presentations No Comments »

“Spatio-Temporal Video Analysis and Visual Activity Recognition” at the Iberian Conference on Pattern Recognition and Image Analysis  (IbPRIA) 2011 Conference in Las Palmas de Gran Canaria. Spain. June 8-10.

Abstract

My research group is focused on a variety of approaches for (a) low-level video analysis and synthesis and (b) recognizing activities in videos. In this talk, I will concentrate on two of our recent efforts. One effort aimed at robust spatio-temporal segmentation of video and another on using motion and flow to recognize and predict actions from video.

In the first part of the talk, I will present an efficient and scalable technique for spatio-temporal segmentation of long video sequences using a hierarchical graph-based algorithm. In this work, we begin by over segmenting a volumetric video graph into space-time regions grouped by appearance. We then construct a “region graph” over the obtained segmentation and iteratively repeat this process over multiple levels to create a tree of spatio-temporal segmentations. This hierarchical approach generates high quality segmentations, which are temporally coherent with stable region boundaries, and allows subsequent applications to choose from varying levels of granularity. We further improve segmentation quality by using dense optical flow to guide temporal connections in the initial graph. I will demonstrate a variety of examples of how this robust segmentation works, and will show additional examples of video-retargeting that use spatio-temporal saliency derived from this segmentation approach. (Matthias Grundmann, Vivek Kwatra, Mei Han, Irfan Essa, CVPR 2010, in collaboration with Google Research).

In the second part of this talk, I will show that constrained multi-agent events can be analyzed and even predicted from video. Such analysis requires estimating the global movements of all players in the scene at any time, and is needed for modeling and predicting how the multi-agent play evolves over time on the playing field. To this end, we propose a novel approach to detect the locations of where the play evolution will proceed, e.g. where interesting events will occur, by tracking player positions and movements over time. To achieve this, we extract the ground level sparse movement of players in each time-step, and then generate a dense motion field. Using this field we detect locations where the motion converges, implying positions towards which the play is evolving. I will show examples of how we have tested this approach for soccer, basketball and hockey. (Kihwan Kim, Matthias Grundmann, Ariel Shamir, Iain Matthews, Jessica Hodgins, Irfan Essa, CVPR 2010, in collaboration with Disney Research).

Time permitting, I will show some more videos of our recent work on video analysis and synthesis. For more information, papers, and videos, see my website.

AddThis Social Bookmark Button

In the News (2010): DARPA Awards Kitware a $13.8 Million Contract for Online Threat Detection and Forensic Analysis in Wide-Area Motion Imagery

September 2nd, 2010 Irfan Essa Posted in Activity Recognition, Grant Schindler, PERSEAS, Visual Surviellance No Comments »

via Kitware – News: DARPA Awards Kitware a $13.8 Million Contract for Online Threat Detection and Forensic Analysis in Wide-Area Motion Imagery.

Kitware has received a $13,883,314 contract from Defense Advanced Research Projects Agency (DARPA) to develop a software system capable of automatically and interactively discovering actionable intelligence from wide area motion imagery (WAMI) of complex urban, suburban, and rural environments.

The primary information elements in WAMI data are moving entities in the context of roads, buildings, and other scene features. These entities, while exploitable, often yield fragmented tracks in complex urban environments due to occlusions, stops, and other factors. Kitware’s software system will use algorithmic solutions to associate tracks and then identify and integrate local events to detect potential threats and perform forensic analysis.

The developed algorithms will form the basis of a software prototype called the Persistent Stare Exploitation and Analysis System (PerSEAS) that will significantly augment an end-user’s ability to discover novel intelligence using models of activities, normalcy, and context. Since the vast majority of events are normal and pose no threat, the models must cross-integrate singular events to discover relationships and anomalies that are indicative of suspicious behavior or match previously learned – or defined – threat activity.

The advanced PerSEAS system will markedly improve an analyst’s ability to handle burgeoning WAMI data and reduce the time required to perform many current exploitation tasks, greatly enhancing the military’s capability to analyze and utilize the data for forensic analysis and through the issuance of timely threat alerts with a minimal number of false alarms.

Due to the complex, multi-disciplinary nature of the research, Kitware will partner with academic experts in the fields of computer vision, probabilistic reasoning, machine learning and other related domains. Phase I of the research is expected to be completed in two years.

The awarded contract will expand Kitware’s leadership in the field of computer vision, video analysis and advanced visualization software. The project will build upon our previous DARPA-sponsored research into content-based video retrieval on the VIRAT program; anomaly detection on the PANDA program; and the recognition of complex multi-agent activities in video.

To meet the PerSEAS program’s needs, Kitware has assembled a world-class team including four leading defense technology companies, Northrop Grumman Corporation, ; Honeywell Automation and Control Solutions Laboratories, Aptima, Inc., and Navia, Inc. As well as multiple internationally-renowned research institutions, including: the University of California, Berkeley; Computer Vision Laboratory, University of Maryland; Rensselaer Polytechnic Institute; the Computer Vision Lab at the University of Central Florida; the School of Interactive Computing at Georgia Tech and its affiliated Center for Robotics & Intelligent Machines; and Columbia University.

 

AddThis Social Bookmark Button

Paper in CVPR (2010): “Motion Field to Predict Play Evolution in Dynamic Sport Scenes

June 13th, 2010 Irfan Essa Posted in Activity Recognition, Jessica Hodgins, Kihwan Kim, Matthias Grundmann, PAMI/ICCV/CVPR/ECCV, Papers, Sports Visualization No Comments »

Kihwan Kim, Matthias Grundmann, Ariel Shamir, Iain Matthews, Jessica Hodgins, Irfan Essa (2010) “Motion Field to Predict Play Evolution in Dynamic Sport Scenes” in Proceedings of IEEE Computer Vision and Pattern Recognition Conference (CVPR), San Francisco, CA, USA, June 2010 [PDF][Website][DOI][Video (Youtube)].

Abstract

Videos of multi-player team sports provide a challenging domain for dynamic scene analysis. Player actions and interactions are complex as they are driven by many factors, such as the short-term goals of the individual player, the overall team strategy, the rules of the sport, and the current context of the game. We show that constrained multi-agent events can be analyzed and even predicted from video. Such analysis requires estimating the global movements of all players in the scene at any time, and is needed for modeling and predicting how the multi-agent play evolves over time on the field. To this end, we propose a novel approach to detect the locations of where the play evolution will proceed, e.g. where interesting events will occur, by tracking player positions and movements over time. We start by extracting the ground level sparse movement of players in each time-step, and then generate a dense motion field. Using this field we detect locations where the motion converges, implying positions towards which the play is evolving. We evaluate our approach by analyzing videos of a variety of complex soccer plays.

CVPR 2010 Paper on Play Evolution

AddThis Social Bookmark Button

Paper in CVPR (2010): “Player Localization Using Multiple Static Cameras for Sports Visualization”

June 13th, 2010 Irfan Essa Posted in Activity Recognition, Jessica Hodgins, Kihwan Kim, Matthias Grundmann, Numerical Machine Learning, PAMI/ICCV/CVPR/ECCV, Raffay Hamid, Sports Visualization No Comments »

Raffay Hamid, Ram Krishan Kumar, Matthias Grundmann, Kihwan Kim, Irfan Essa, Jessica Hodgins (2010), “Player Localization Using Multiple Static Cameras for Sports Visualization” In Proceedings of IEEE Computer Vision and Pattern Recognition Conference (CVPR), San Francisco, CA, USA, June 2010 [PDF][Website][DOI][Video (Youtube)].

Abstract

We present a novel approach for robust localization of multiple people observed using multiple cameras. We usethis location information to generate sports visualizations,which include displaying a virtual offside line in soccer games, and showing players’ positions and motion patterns.Our main contribution is the modeling and analysis for the problem of fusing corresponding players’ positional informationas finding minimum weight K-length cycles in complete K-partite graphs. To this end, we use a dynamic programmingbased approach that varies over a continuum of being maximally to minimally greedy in terms of the numberof paths explored at each iteration. We present an end-to-end sports visualization framework that employs our proposed algorithm-class. We demonstrate the robustness of our framework by testing it on 60; 000 frames of soccerfootage captured over 5 different illumination conditions, play types, and team attire.

Teaser Image from CVPR 2010 paper

AddThis Social Bookmark Button

CVPR 2010: Accepted Papers

April 1st, 2010 Irfan Essa Posted in Activity Recognition, Computational Photography and Video, Jessica Hodgins, Kihwan Kim, Matthias Grundmann, PAMI/ICCV/CVPR/ECCV, Papers, Vivek Kwatra No Comments »

We have the following 4 papers that have been accepted for publications in IEEE CVPR 2010. More details forthcoming, with links to more details.
  • Matthias Grundmann, Vivek Kwatra, Mei Han, and Irfan Essa (2010) “Discontinuous Seam-Carving for Video Retargeting” (a GA Tech, Google Collaboration)
  • Matthias Grundmann, Vivek Kwatra, Mei Han, and Irfan Essa (2010) “Efficient Hierarchical Graph-Based Video Segmentation” (a GA Tech, Google Collaboration)
  • Kihwan Kim, Matthias Grundmann, Ariel Shamir, Iain Matthews, Jessica Hodgins, and Irfan Essa (2010) “Motion Fields to Predict Play Evolution in Dynamic Sport Scenes” (a GA Tech, Disney Collaboration)
  • Raffay Hamid, Ramkrishan Kumar, Matthias Grundmann, Kihwan Kim, Irfan Essa, and Jessica Hodgins (2010) “Player Localization Using Multiple Static Cameras for Sports Visualization” (a GA Tech, Disney Collaboration)
AddThis Social Bookmark Button

Paper in Advanced Robotics (2009): “Human Action Recognition Using Global Point Feature Histograms and Action Shapes”

October 29th, 2009 Irfan Essa Posted in Activity Recognition, Franzi Meier, Intelligent Environments, Michael Beetz, Papers No Comments »

Radu Bogdan Rusu, Jan Bandouch, Franziska Meier, Irfan Essa and Michael Beetz (2009) “Human Action Recognition Using Global Point Feature Histograms and Action Shapes”, in Journal of Advanced Robotics, volume 23, pages 1873–1908, Koninklijke Brill NV, Leiden and The Robotics Society of Japan, 2009. [ DOI | PDF]

Abstract

This paper investigates the recognition of human actions from three-dimensional (3-D) point clouds that encode the motions of people acting in sensor-distributed indoor environments. Data streams are time sequences of silhouettes extracted from cameras in the environment. From the 2-D silhouette contours we generate space–time streams by continuously aligning and stacking the contours along the time axis as third spatial dimension. The space–time stream of an observation sequence is segmented into parts corresponding to subactions using a pattern matching technique based on suffix trees and interval scheduling. Then, the segmented space–time shapes are processed by treating the shapes as 3-D point clouds and estimating global point feature histograms for them. The resultant models are clustered using statistical analysis and our experimental results indicate that the presented methods robustly derive different action classes. This holds despite large intra-class variance in the recorded datasets due to performances from different persons at different time intervals.

© Koninklijke Brill NV, Leiden and The Robotics Society of Japan, 2009

Overview of the approach.

Overview of the approach.

Keywords: Action recognition, point cloud, global features, action segmentation

AddThis Social Bookmark Button

Paper in Artificial Intelligence (2009): “A novel sequence representation for unsupervised analysis of human activities”

September 20th, 2009 Irfan Essa Posted in Aaron Bobick, Activity Recognition, Charles Isbell, Papers, Raffay Hamid, Siddhartha Maddi No Comments »

A novel sequence representation for unsupervised analysis of human activities

  • R. Hamid, S. Maddi, A. Johnson, A. Bobick, I. Essa, and C. Isbell (2009), “A Novel Sequence Representation for Unsupervised Analysis of Human Activities,” Artificial Intelligence Journal, 2009. [BIBTEX]
    @article{2009-Hamid-NSRUAHA,
      Author = {R. Hamid and S. Maddi and A. Johnson and A. Bobick and I. Essa and C. Isbell},
      Date-Modified = {2011-12-08 21:27:48 +0000},
      Journal = {Artificial Intelligence Journal},
      Month = {May},
      Title = {A Novel Sequence Representation for Unsupervised Analysis of Human Activities},
      Year = {2009}}

Abstract

Formalizing computational models for everyday human activities remains an open challenge. Many previous approaches towards this end assume prior knowledge about the structure of activities, using which explicitly defined models are learned in a completely supervised manner. For a majority of everyday environments however, the structure of the in situ activities is generally not known a priori. In this paper we investigate knowledge representations and manipulation techniques that facilitate learning of human activities in a minimally supervised manner. The key contribution of this work is the idea that global structural information of human activities can be encoded using a subset of their local event subsequences, and that this encoding is sufficient for activity-class discovery and classification.

In particular, we investigate modeling activity sequences in terms of their constituent subsequences that we call event n-grams. Exploiting this representation, we propose a computational framework to automatically discover the various activity-classes taking place in an environment. We model these activity-classes as maximally similar activity-cliques in a completely connected graph of activities, and describe how to discover them efficiently. Moreover, we propose methods for finding characterizations of these discovered classes from a holistic as well as a by-parts perspective. Using such characterizations, we present a method to classify a new activity to one of the discovered activity-classes, and to automatically detect whether it is anomalous with respect to the general characteristics of its membership class. Our results show the efficacy of our approach in a variety of everyday environments.

Keywords: Temporal reasoning; Scene analysis; Computer vision

Hamid et al AIJ Paper

AddThis Social Bookmark Button