Home Bio/CV Affiliations Research Teaching Publications Collaborators/Students Calendar Contact FAQ RSS

Kihwan Kim’s Thesis Defense (2011): “Spatio-temporal Data Interpolation for Dynamic Scene Analysis”

December 6th, 2011 Irfan Essa Posted in Computational Photography and Video, Kihwan Kim, Modeling and Animation, Multimedia, PhD, Security, Visual Surviellance, WWW No Comments »

Spatio-temporal Data Interpolation for Dynamic Scene Analysis

Kihwan Kim, PhD Candidate

School of Interactive Computing, College of Computing, Georgia Institute of Technology

Date: Tuesday, December 6, 2011

Time: 1:00 pm – 3:00 pm EST

Location: Technology Square Research Building (TSRB) Room 223

Abstract

Analysis and visualization of dynamic scenes is often constrained by the amount of spatio-temporal information available from the environment. In most scenarios, we have to account for incomplete information and sparse motion data, requiring us to employ interpolation and approximation methods to fill for the missing information. Scattered data interpolation and approximation techniques have been widely used for solving the problem of completing surfaces and images with incomplete input data. We introduce approaches for such data interpolation and approximation from limited sensors, into the domain of analyzing and visualizing dynamic scenes. Data from dynamic scenes is subject to constraints due to the spatial layout of the scene and/or the configurations of video cameras in use. Such constraints include: (1) sparsely available cameras observing the scene, (2) limited field of view provided by the cameras in use, (3) incomplete motion at a specific moment, and (4) varying frame rates due to different exposures and resolutions.

In this thesis, we establish these forms of incompleteness in the scene, as spatio- temporal uncertainties, and propose solutions for resolving the uncertainties by applying scattered data approximation into a spatio-temporal domain.

The main contributions of this research are as follows: First, we provide an effi- cient framework to visualize large-scale dynamic scenes from distributed static videos. Second, we adopt Radial Basis Function (RBF) interpolation to the spatio-temporal domain to generate global motion tendency. The tendency, represented by a dense flow field, is used to optimally pan and tilt a video camera. Third, we propose a method to represent motion trajectories using stochastic vector fields. Gaussian Pro- cess Regression (GPR) is used to generate a dense vector field and the certainty of each vector in the field. The generated stochastic fields are used for recognizing motion patterns under varying frame-rate and incompleteness of the input videos. Fourth, we also show that the stochastic representation of vector field can also be used for modeling global tendency to detect the region of interests in dynamic scenes with camera motion. We evaluate and demonstrate our approaches in several applications for visualizing virtual cities, automating sports broadcasting, and recognizing traffic patterns in surveillance videos.

Committee:

  • Prof. Irfan Essa (Advisor, School of Interactive Computing, Georgia Institute of Technology)
  • Prof. James M. Rehg (School of Interactive Computing, Georgia Institute of Technology)
  • Prof. Thad Starner (School of Interactive Computing, Georgia Institute of Technology)
  • Prof. Greg Turk (School of Interactive Computing, Georgia Institute of Technology)
  • Prof. Jessica K. Hodgins (Robotics Institute, Carnegie Mellon University, and Disney Research Pittsburgh)
AddThis Social Bookmark Button

Event: CnJ Panel at Georgia Tech’s Future Media Fest 2011 | Computation + Journalism

November 15th, 2011 Irfan Essa Posted in Computational Journalism, Eric Gilbert, Events No Comments »

Computational Journalism is defined as the application of computation to the activities of journalism such as information gathering, organization, communication, and dissemination of information, while upholding values of journalism such as accuracy and verifiability. Journalists are increasingly adopting and using the proliferation of open-source tools and embracing different styles of journalism. Explore how newsrooms are opening, what new tools are being created, and how to use those tools most effectively.

Panelists:

Topics of discussion will include (but will not be limited to):

  • What is Computational Journalism?
  • What impact has Computation / Information Technology / Networking Technology had on Journalism?
  • What is the newsroom of the future? How has the newsroom changed?
  • How has investigative journalism changed with new technologies?
  • How is social networking changed how we gather, distribute, and share news (and information)?
  • What are the economic / financial models that need to explored to support (and sustain) journalism?
  • What is the role of an Editor in the new journalism model?
  • What should we be teaching the next generation of journalists?

via CnJ Panel at Georgia Tech’s Future Media Fest 2011 | Computation + Journalism.

AddThis Social Bookmark Button

Paper in ICCV 2011: “Gaussian Process Regression Flow for Analysis of Motion Trajectories”

October 28th, 2011 Irfan Essa Posted in Activity Recognition, DARPA, Kihwan Kim, PAMI/ICCV/CVPR/ECCV, Papers No Comments »

Gaussian Process Regression Flow for Analysis of Motion Trajectories

  • Kim, Lee, and Essa (2011), “Gaussian Process Regression Flow for Analysis of Motion Trajectories,” in Proceedings of IEEE International Conference on Computer Vision (ICCV), 2011. [PDF] [WEBSITE] [VIDEO] [BIBTEX]
     @inproceedings{Kim2011-GPRF, Author = {K. Kim and D. Lee and I. Essa}, Booktitle = {Proceedings of IEEE International Conference on Computer Vision (ICCV)}, Month = {November}, Pdf = {http://www.cc.gatech.edu/~irfan/p/2011-Kim-GPRFAMT.pdf}, Publisher = {IEEE Computer Society}, Title = {Gaussian Process Regression Flow for Analysis of Motion Trajectories}, Url = {http://www.cc.gatech.edu/cpl/projects/gprf/}, Video = {http://www.youtube.com/watch?v=UtLr37hDQz0}, Year = {2011}}

Abstract

Analysis and Recognition of motions and activities of objects in videos requires effective representations for analysis and matching of motion trajectories. In this paper, we introduce a new representation specifically aimed at matching motion trajectories. We model a trajectory as a continuous dense flow field from a sparse set of vector sequences using Gaussian Process Regression. Furthermore, we introduce a random sampling strategy for learning stable classes of motions from limited data.

Our representation allows for incrementally predicting possible paths and detecting anomalous events from online trajectories. This representation also supports matching of complex motions with acceleration changes and pauses or stops within a trajectory. We use the proposed approach for classifying and predicting motion trajectories in traffic monitoring domains and test on several data sets. We show that our approach works well on various types of complete and incomplete trajectories from a variety of video data sets with different frame rates

AddThis Social Bookmark Button

In the News (2011): “Shake it like an Instagram picture — Online Video News”

September 15th, 2011 Irfan Essa Posted in Collaborators, Computational Photography and Video, Google, In The News, Matthias Grundmann, Vivek Kwatra, WWW No Comments »

YouTube effects: Shake it like an Instagram picture

via YouTube effects: Shake it like an Instagram picture — Online Video News.

YouTube users can now apply a number of Instagram-like effects to their videos, giving them a cartoonish or Lomo-like look with the click of a button. The effects are part of a new editing feature that also includes cropping and advanced image stabilization.

Taking the shaking out of video uploads should go a long way towards making some of the amateur footage captured on mobile phones more watchable, but it can also be resource-intensive — which is why Google’s engineers invented an entirely new approach toward image stabilization.

The new editing functionality will be part of YouTube’s video page, where a new “Edit video” button will offer access to filters and other editing functionality. This type of post-processing is separate from YouTube’s video editor, which allows to produce new videos based on existing clips.

AddThis Social Bookmark Button

Funding (2011) NSF (1146352) “EAGER: Linguistic Task Transfer for Humans and Cyber Systems”

September 1st, 2011 Irfan Essa Posted in Activity Recognition, Mike Stilman, NSF, Robotics No Comments »

EAGER: Linguistic Task Transfer for Humans and Cyber Systems (Mike Stillman, Irfan Essa) NSF/RI

This project, investigating formal languages as a general methodology for task transfer between distinct cyber-physical systems such as humans and robots, aims to expand the science of cyber physical systems by developing Motion Grammars that will enable task transfer between distinct systems.

Formal languages are tools for encoding, describing and transferring structured knowledge. In natural language, the latter process is called communication. Similarly, we will develop a formal language through which arbitrary cyber-physical systems communicate tasks via structured actions. This investigation of Motion Grammars will contribute to the science of human cognition and the engineering of cyber-physical algorithms. By observing human activities during manipulation we will develop a novel class of hybrid control algorithms based on linguistic representations of task execution. These algorithms will broaden the capabilities of man-made systems and provide the infrastructure for motion transfer between humans, robots and broader systems in a generic context. Furthermore, the representation in a rigorous grammatical context will enable formal verification and validation in future work.
Broader Impacts: The proposed research has direct applications to new solutions for manufacturing, medical treatments such as surgery, logistics and food processing. In turn, each of these areas has a significant impact on the efficiency and convenience of our daily lives. The PIs serve as coordinators of graduate/undergraduate programs and mentors to community schools. In order to guarantee that women and minorities have a significant role in the research, the PIs will annually invite K-12 students from Atlanta schools with primarily African American populations to the laboratories. One-day robot classes will be conducted that engage students in the excitement of hands-on science by interactively using lab equipment to transfer their manipulation skills to a robot arm.

Via Award#1146352 – EAGER: Linguistic Task Transfer for Humans and Cyber Systems.

AddThis Social Bookmark Button

DEMO (2011): Auto-Directed Video Stabilization with Robust L1 Optimal Camera Paths – from Google Research Blog

June 20th, 2011 Irfan Essa Posted in Computational Photography and Video, In The News, Matthias Grundmann, Mobile Computing, PAMI/ICCV/CVPR/ECCV, Vivek Kwatra No Comments »

via Auto-Directed Video Stabilization with Robust L1 Optimal Camera Paths – Google Research Blog.

Auto-Directed Video Stabilization with Robust L1 Optimal Camera Paths
Posted by Matthias GrundmannVivek Kwatra, and Irfan Essa,

Earlier this year, we announced the launch of new features on the YouTube Video Editor, including stabilization for shaky videos, with the ability to preview them in real-time. The core technology behind this feature is detailed in this paper, which will be presented at the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR 2011).

Casually shot videos captured by handheld or mobile cameras suffer from significant amount of shake. Existing in-camera stabilization methods dampen high-frequency jitter but do not suppress low-frequency movements and bounces, such as those observed in videos captured by a walking person. On the other hand, most professionally shot videos usually consist of carefully designed camera configurations, using specialized equipment such as tripods or camera dollies, and employ ease-in and ease-out for transitions. Our goal was to devise a completely automatic method for converting casual shaky footage into more pleasant and professional looking videos.

Our technique mimics the cinematographic principles outlined above by automatically determining the best camera path using a robust optimization technique. The original, shaky camera path is divided into a set of segments, each approximated by either a constant, linear or parabolic motion. Our optimization finds the best of all possible partitions using a computationally efficient and stable algorithm.

To achieve real-time performance on the web, we distribute the computation across multiple machines in the cloud. This enables us to provide users with a real-time preview and interactive control of the stabilized result. Above we provide a video demonstration of how to use this feature on the YouTube Editor. We will also demo this live at Google’s exhibition booth in CVPR 2011.

For more details see the Project Site. See the youtube video of the system on youtube. See the paper in PDF, and a technical video of the work.

Full paper is

 

AddThis Social Bookmark Button

Paper (2011) in IEEE CVPR: “Auto-Directed Video Stabilization with Robust L1 Optimal Camera Paths”

June 19th, 2011 Irfan Essa Posted in Computational Photography and Video, Matthias Grundmann, PAMI/ICCV/CVPR/ECCV, Papers, Vivek Kwatra No Comments »

Auto-Directed Video Stabilization with Robust L1 Optimal Camera Paths

  • Grundmann, Kwatra, and Essa (2011), “Auto-Directed Video Stabilization with Robust L1 Optimal Camera Paths,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011.  [PDF] [WEBSITE][VIDEO] [DEMO][Google Research Blog] [BIBTEX]
     @inproceedings{2011-Grundmann-AVSWROCP, Author = {M. Grundmann and V. Kwatra and I. Essa}, Booktitle = {Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, Month = {June}, Pdf = {http://www.cc.gatech.edu/~irfan/p/2011-Grundmann-AVSWROCP}, Publisher = {IEEE Computer Society}, Title = {Auto-Directed Video Stabilization with Robust L1 Optimal Camera Paths}, Url = {http://www.cc.gatech.edu/cpl/projects/videostabilization/}, Video = {http://www.youtube.com/watch?v=i5keG1Y810U}, Year = {2011}}

Abstract

We present a novel algorithm for automatically applying constrainable, L1-optimal camera paths to generate stabilized videos by removing undesired motions. Our goal is to compute camera paths that are composed of constant, linear and parabolic segments mimicking the camera motions employed by professional cinematographers. To this end, our algorithm is based on a linear programming framework to minimize the first, second, and third derivatives of the resulting camera path. Our method allows for video stabilization beyond the conventional filtering of camera paths that only suppresses high frequency jitter. We incorporate additional constraints on the path of the camera directly in our algorithm, allowing for stabilized and retargeted videos. Our approach accomplishes this without the need of user interaction or costly 3D reconstruction of the scene, and works as a post-process for videos from any camera or from an online source.

AddThis Social Bookmark Button

Presentation (2011) at IBPRIA 2011: “Spatio-Temporal Video Analysis and Visual Activity Recognition”

June 8th, 2011 Irfan Essa Posted in Activity Recognition, Computational Photography and Video, Kihwan Kim, Matthias Grundmann, Multimedia, PAMI/ICCV/CVPR/ECCV, Presentations No Comments »

“Spatio-Temporal Video Analysis and Visual Activity Recognition” at the Iberian Conference on Pattern Recognition and Image Analysis  (IbPRIA) 2011 Conference in Las Palmas de Gran Canaria. Spain. June 8-10.

Abstract

My research group is focused on a variety of approaches for (a) low-level video analysis and synthesis and (b) recognizing activities in videos. In this talk, I will concentrate on two of our recent efforts. One effort aimed at robust spatio-temporal segmentation of video and another on using motion and flow to recognize and predict actions from video.

In the first part of the talk, I will present an efficient and scalable technique for spatio-temporal segmentation of long video sequences using a hierarchical graph-based algorithm. In this work, we begin by over segmenting a volumetric video graph into space-time regions grouped by appearance. We then construct a “region graph” over the obtained segmentation and iteratively repeat this process over multiple levels to create a tree of spatio-temporal segmentations. This hierarchical approach generates high quality segmentations, which are temporally coherent with stable region boundaries, and allows subsequent applications to choose from varying levels of granularity. We further improve segmentation quality by using dense optical flow to guide temporal connections in the initial graph. I will demonstrate a variety of examples of how this robust segmentation works, and will show additional examples of video-retargeting that use spatio-temporal saliency derived from this segmentation approach. (Matthias Grundmann, Vivek Kwatra, Mei Han, Irfan Essa, CVPR 2010, in collaboration with Google Research).

In the second part of this talk, I will show that constrained multi-agent events can be analyzed and even predicted from video. Such analysis requires estimating the global movements of all players in the scene at any time, and is needed for modeling and predicting how the multi-agent play evolves over time on the playing field. To this end, we propose a novel approach to detect the locations of where the play evolution will proceed, e.g. where interesting events will occur, by tracking player positions and movements over time. To achieve this, we extract the ground level sparse movement of players in each time-step, and then generate a dense motion field. Using this field we detect locations where the motion converges, implying positions towards which the play is evolving. I will show examples of how we have tested this approach for soccer, basketball and hockey. (Kihwan Kim, Matthias Grundmann, Ariel Shamir, Iain Matthews, Jessica Hodgins, Irfan Essa, CVPR 2010, in collaboration with Disney Research).

Time permitting, I will show some more videos of our recent work on video analysis and synthesis. For more information, papers, and videos, see my website.

AddThis Social Bookmark Button

PhD Fellowships from Google Research for Matthias Grundmann

May 16th, 2011 Irfan Essa Posted in Awards, In The News, Matthias Grundmann No Comments »

Congratulations to Matthias Grundmann, winner of the Google PhD Fellowship in Computer Vision for 2012.

via PhD Fellowships – Google Research.

Google PhD Fellowship Program Overview

Nurturing and maintaining strong relations with the academic community is a top priority at Google. The Google U.S./Canada PhD Student Fellowship Program was created to recognize outstanding graduate students doing exceptional work in computer science, related disciplines, or promising research areas. Last year we awarded 14 unique fellowships to some amazing students in the US and Canada:

  • Matthias Grundmann, Google U.S./Canada Fellowship in Computer Vision (Georgia Institute of Technology)
AddThis Social Bookmark Button

Going Live on YouTube (2011): Lights, Camera… EDIT! New Features for the YouTube Video Editor

March 21st, 2011 Irfan Essa Posted in Computational Photography and Video, Google, In The News, Matthias Grundmann, Multimedia, Vivek Kwatra, WWW No Comments »

via YouTube Blog: Lights, Camera… EDIT! New Features for the YouTube Video Editor.

Lights, Camera… EDIT! New Features for the YouTube Video Editor

Nine months ago we launched our cloud-based video editor. It was a simple product built to provide our users with simple editing tools. Although it didn’t have all the features available on paid desktop editing software, the idea was that the vast majority of people’s video editing needs are pretty basic and straight-forward and we could provide these features with a free editor available on the Web. Since launch, hundreds of thousands of videos have been published using the YouTube Video Editor and we’ve regularly pushed out new feature enhancements to the product, including:

  • Video transitions (crossfade, wipe, slide)
  • The ability to save projects across sessions
  • Increased clips allowed in the editor from 6 to 17
  • Video rotation (from portrait to landscape and vice versa – great for videos shot on mobile)
  • Shape transitions (heart, star, diamond, and Jack-O-Lantern for Halloween)
  • Audio mixing (AudioSwap track mixed with original audio)
  • Effects (brightness/contrast, black & white)

A new user interface and project menu for multiple saved projects

While many of these are familiar features also available on desktop software, today, we’re excited to unveil two new features that the team has been working on over the last couple of months that take unique advantage of the cloud:

Stabilizer

Ever shoot a shaky video that’s so jittery, it’s actually hard to watch? Professional cinematographers use stabilization equipment such as tripods or camera dollies to keep their shots smooth and steady. Our team mimicked these cinematographic principles by automatically determining the best camera path for you through a unified optimization technique. In plain English, you can smooth some of those unsteady videos with the click of a button. We also wanted you to be able to preview these results in real-time, before publishing the finished product to the Web. We can do this by harnessing the power of the cloud by splitting the computation required for stabilizing the video into chunks and distributed them across different servers. This allows us to use the power of many machines in parallel, computing and streaming the stabilized results quickly into the preview. You can check out the paper we’re publishing entitled “Auto-Directed Video Stabilization with Robust L1 Optimal Camera Paths.” Want to see stabilizer in action? You can test it out for yourself, or check out these two videos. The first is without stabilizer.

And now, with the stabilizer:

AddThis Social Bookmark Button