AT HIGH Museum/Lumière’s Fall Lecture and Panel Discussion on “Art In The Digital Culture… Threat or Opportunity?”

September 8th, 2012 Irfan Essa Posted in Computational Photography and Video, In The News, Presentations No Comments »

Wednesday September 19, 2012, 7:00pm in the Hill Auditorium, High Museum, Altanta.

In this sixth installment of Lumière’s Fall Lecture Series, Shannon Perich, curator of the photographic history collection at the National Museum of American History, Smithsonian Institution, and Irfan Essa of the Georgia Institute of Technology will each speak to the future of art in a rapidly expanding digital culture. Their commentary will be followed by a panel discussion with audience participation. The panel will address the threats and opportunities created by a growing range of capabilities to create, distribute, and interact with art. Additional information is available at www.lumieregallery.net.This lecture is a collaborative event with the Atlanta Celebrates Photography 2012 Festival.

via Lumière’s Fall Lecture and Panel Discussion.

SLIDES now available here

AddThis Social Bookmark Button

AT UBICOMP 2012 Conference, in Pittsburgh, PA, September 5 – 7, 2012

September 4th, 2012 Irfan Essa Posted in Edison Thomaz, Grant Schindler, Gregory Abowd, Papers, Presentations, Thomas Ploetz, UBICOMP, Ubiquitous Computing, Vinay Bettadapura No Comments »

At ACM sponsored, 14th International Conference on Ubiquitous Computing (Ubicomp 2012), Pittsburgh, PA, September 5 – 7, 2012.

Here are the highlights of my group’s participation in Ubicomp 2012.

  • E. Thomaz, V. Bettadapura, G. Reyes, M. Sandesh, G. Schindler, T. Ploetz, G. D. Abowd, and I. Essa (2012), “Recognizing Water-Based Activities in the Home Through Infrastructure-Mediated Sensing,” in Proceedings of ACM International Conference on Ubiquitous Computing (UBICOMP), 2012. [PDF] [WEBSITE] (Oral Presentation at 2pm on Wednesday September 5, 2012).
  • J. Wang, G. Schindler, and I. Essa (2012), “Orientation Aware Scene Understanding for Mobile Camera,” in Proceedings of ACM International Conference on Ubiquitous Computing (UBICOMP), 2012. [PDF][WEBSITE] (Oral Presentation at 2pm on Thursday September 6, 2012).

In addition, my colleague, Gregory Abowd has a position paper on “What next, Ubicomp? Celebrating an intellectual disappearing act” on Wednesday 11:15am session and my other colleague/collaborator Thomas Ploetz has a paper on “Automatic Assessment of Problem Behavior in Individuals with Developmental Disabilities” with his co-authors Nils Hammerla, Agata Rozga, Andrea Reavis, Nathan Call, Gregory Abowd on Friday September 6, in the 9:15am session.

AddThis Social Bookmark Button

AT Texas Instruments to give a Talk on “Video Stabilization and Rolling Shutter Removal on YouTube

August 22nd, 2012 Irfan Essa Posted in Computational Photography and Video, Matthias Grundmann, Presentations, Vivek Kwatra No Comments »

Video Stabilization and Rolling Shutter Removal on YouTube

Abstract

In this talk, I will over a variety of approaches my group is working on for video analysis and enhancement. In particular, I will describe our approach for a video stabilizer (currently implemented on YouTube) and its extensions. This work is in collaboration with Matthias Grundmann and Vivek Kwatra at Google. This method generates stabilized videos by employing L1-optimal camera paths to remove undesirable motions [1]. We compute camera paths that are optimally partitioned into constant, linear and parabolic segments mimicking the camera motions employed by professional cinematographers. To this end, we propose a linear programming framework to minimize the first, second, and third derivatives of the resulting camera path. Our method allows for video stabilization beyond the conventional filtering that only suppresses high frequency jitter. An additional challenge in videos shot from mobile phones are rolling shutter distortions. Modern CMOS cameras capture the frame one scanline at a time, which results in non-rigid image distortions such as shear and wobble. I will demonstrate a solution based on a novel mixture model of homographies parametrized by scanline blocks to correct these rolling shutter distortions [2]. Our method does not rely on a-priori knowledge of the readout time nor requires prior camera calibration. A thorough evaluation based on a user study demonstrates a general preference for our algorithm.

I will conclude the talk by showcasing a live demo of the stabilizer and time permitting, I will discuss some other projects we are working on.

[1] Matthias Grundmann, Vivek Kwatra, Irfan Essa, CVPR 2011, www.cc.gatech.edu/cpl/projects/videostabilization

[2] Matthias Grundmann, Vivek Kwatra, Daniel Castro Irfan Essa, ICCP 2012, Best paper, www.cc.gatech.edu/cpl/projects/rollingshutter

AddThis Social Bookmark Button

At CVPR 2012, in Providence, RI, June 16 – 21, 2012

June 17th, 2012 Irfan Essa Posted in Activity Recognition, Computational Photography and Video, Kihwan Kim, Matthias Grundmann, PAMI/ICCV/CVPR/ECCV, Presentations, Vivek Kwatra No Comments »

At IEEE CVPR 2012 is in Providence RI, from Jun 16 – 21, 2012.

Busy week ahead meeting good friends and colleagues. Here are some highlights of what my group is involved with.

Paper in Main Conference

  • K. Kim, D. Lee, and I. Essa (2012), “Detecting Regions of Interest in Dynamic Scenes with Camera Motions,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012. [PDF] [WEBSITE] [VIDEO] [Poster on Tuesday 6/19/2012]

Demo in Main Conference

  • M. Grundmann, V. Kwatra, D. Castro, and I. Essa (2012), “Calibration-Free Rolling Shutter Removal,” in [WEBSITE] [VIDEO] (Paper in ICCP 2012) [Demo on Monday and Tuesday (6/18-19) at the Google Booth]

Invited Talk in Workshop

AddThis Social Bookmark Button

AT IWCV 2012: “Videos Understanding: Extracting Content and Context from Video.”

May 24th, 2012 Irfan Essa Posted in Activity Recognition, Computational Photography and Video, Presentations, Visual Surviellance No Comments »

Videos Understanding: Extracting Content and Context from Video.

(Presentation at the International Workshop on Computer Vision 2012, Ortigia, Siracusa, Sicily, May 22-24, 2012.)

Irfan Essa
GEORGIA Tech

Abstract

In this talk, I will describe various efforts aimed at extracting context and content from video. I will highlight some of our recent work in extracting spatio-temporal features and the related saliency information from the video, which can be used to detect and localize regions of interest in video. Then I will describe approaches that use structured and unstructured representations to recognize the complex and extended-time actions.  I will also discuss the need for unsupervised activity discovery, and detection of anomalous activities from videos. I will show a variety of examples, which will include online videos, mobile videos, surveillance and home monitoring video, and sports videos. Finally, I will pose a series of questions and make observations about how we need to extend our current paradigms of video understanding to go beyond local spatio-temporal features, and standard time-series and bag of words models.

AddThis Social Bookmark Button

Presentation to the New/Incoming Graduate Students at the College of Computing (August 2011).

August 18th, 2011 Irfan Essa Posted in Presentations No Comments »

AddThis Social Bookmark Button

Presentation (2011) at IBPRIA 2011: “Spatio-Temporal Video Analysis and Visual Activity Recognition”

June 8th, 2011 Irfan Essa Posted in Activity Recognition, Computational Photography and Video, Kihwan Kim, Matthias Grundmann, Multimedia, PAMI/ICCV/CVPR/ECCV, Presentations No Comments »

“Spatio-Temporal Video Analysis and Visual Activity Recognition” at the Iberian Conference on Pattern Recognition and Image Analysis  (IbPRIA) 2011 Conference in Las Palmas de Gran Canaria. Spain. June 8-10.

Abstract

My research group is focused on a variety of approaches for (a) low-level video analysis and synthesis and (b) recognizing activities in videos. In this talk, I will concentrate on two of our recent efforts. One effort aimed at robust spatio-temporal segmentation of video and another on using motion and flow to recognize and predict actions from video.

In the first part of the talk, I will present an efficient and scalable technique for spatio-temporal segmentation of long video sequences using a hierarchical graph-based algorithm. In this work, we begin by over segmenting a volumetric video graph into space-time regions grouped by appearance. We then construct a “region graph” over the obtained segmentation and iteratively repeat this process over multiple levels to create a tree of spatio-temporal segmentations. This hierarchical approach generates high quality segmentations, which are temporally coherent with stable region boundaries, and allows subsequent applications to choose from varying levels of granularity. We further improve segmentation quality by using dense optical flow to guide temporal connections in the initial graph. I will demonstrate a variety of examples of how this robust segmentation works, and will show additional examples of video-retargeting that use spatio-temporal saliency derived from this segmentation approach. (Matthias Grundmann, Vivek Kwatra, Mei Han, Irfan Essa, CVPR 2010, in collaboration with Google Research).

In the second part of this talk, I will show that constrained multi-agent events can be analyzed and even predicted from video. Such analysis requires estimating the global movements of all players in the scene at any time, and is needed for modeling and predicting how the multi-agent play evolves over time on the playing field. To this end, we propose a novel approach to detect the locations of where the play evolution will proceed, e.g. where interesting events will occur, by tracking player positions and movements over time. To achieve this, we extract the ground level sparse movement of players in each time-step, and then generate a dense motion field. Using this field we detect locations where the motion converges, implying positions towards which the play is evolving. I will show examples of how we have tested this approach for soccer, basketball and hockey. (Kihwan Kim, Matthias Grundmann, Ariel Shamir, Iain Matthews, Jessica Hodgins, Irfan Essa, CVPR 2010, in collaboration with Disney Research).

Time permitting, I will show some more videos of our recent work on video analysis and synthesis. For more information, papers, and videos, see my website.

AddThis Social Bookmark Button

Fall 2010 GRASP Seminar: “Two Talks On Video Analysis: 1 Segmentation Of Video And 2 Prediction Of Actions In Video”

September 20th, 2010 Irfan Essa Posted in Computational Photography and Video, Presentations No Comments »

Fall 2010 GRASP Seminar – Irfan Essa, Georgia Institute Of Technology, “Two Talks On Video Analysis: 1 Segmentation Of Video And 2 Prediction Of Actions In Video” | GRASP Laboratory – University Of Pennsylvania.

Friday September 24, 2010 from 11:00am to 12:00pm

 

My research group is focused on a variety if approaches for video analysis and synthesis. In this talk, I will focus on two of our recent efforts.  One effort aimed at robust spatio-temporal segmentation of video and another on using motion and flow to predict actions from video.

In the first part of the talk, I will present an efficient and scalable technique for spatio-temporal segmentation of long video sequences using a hierarchical graph-based algorithm. In this effort, we begin by over segmenting a volumetric video graph into space-time regions grouped by appearance. We then construct a “region graph” over the obtained segmentation and iteratively repeat this process over multiple levels to create a tree of spatio-temporal segmentations. This hierarchical approach generates high quality segmentations, which are temporally coherent with stable region boundaries, and allows subsequent applications to choose from varying levels of granularity. We further improve segmentation quality by using dense optical flow to guide temporal connections in the initial graph. I will demonstrate a variety of examples of how this robust segmentation works, and will show additional examples of video-retargeting that use the saliency from this segmentation approach.  (Matthias Grundmann, Vivek Kwatra, Mei Han, Irfan Essa, CVPR 2010, in collaboration with Google Research).

In the second part of this talk, I will show that constrained multi-agent events can be analyzed and even predicted from video. Such analysis requires estimating the global movements of all players in the scene at any time, and is needed for modeling and predicting how the multi-agent play evolves over time on the field. To this end, we propose a novel approach to detect the locations of where the play evolution will proceed, e.g. where interesting events will occur, by tracking player positions and movements over time. To achieve this, we extract the ground level sparse movement of players in each time-step, and then generate a dense motion field. Using this field we detect locations where the motion converges, implying positions towards which the play is evolving. I will show examples of how we have tested this approach for soccer, basketball and hockey. (Kihwan Kim, Matthias Grundmann, Ariel Shamir, Iain Matthews, Jessica Hodgins, Irfan Essa, CVPR 2010, in collaboration with Disney Research).

Time permitting, I will show some more videos of our recent work on video analysis and synthesis. For more information, papers, and videos, see my website athttp://prof.irfanessa.com/
Presenter’s Biography:

Irfan Essa is a Professor in the School of Interactive Computing(iC) of the College of Computing (CoC), and Adjunct Professor in the School of Electrical and Computer Engineering, Georgia Institute of Technology (GA Tech), in Atlanta, Georgia, USA.

Irfan Essa works in the areas of Computer Vision, Computer Graphics, Computational Perception, Robotics and Computer Animation, with potential impact on Video Analysis and Production (e.g., Computational Photography & Video, Image-based Modeling and Rendering, etc.) Human Computer Interaction, and Artificial Intelligence research. Specifically, he is interested in the analysis, interpretation, authoring, and synthesis (of video), with the goals of building aware environments, recognizing, modeling human activities, and behaviors, and developing dynamic and generative representations of time-varying streams. He has published over a 150 scholarly articles in leading journals and conference venues on these topics and has awards for his research and teaching.

He joined Georgia Tech Faculty in 1996 after his earning his MS (1990), Ph.D. (1994), and holding research faculty position at the Massachusetts Institute of Technology (Media Lab) [1988-1996]. His Doctoral Research was in the area of Facial Recognition, Analysis, and Synthesis.

 

AddThis Social Bookmark Button

Presentation at International Workshop on Video (2009): “Temporal Representations of Video for Analysis and Synthesis”

May 26th, 2009 Irfan Essa Posted in Computational Photography and Video, Presentations No Comments »

“Temporal Representations of Video for Analysis and Synthesis” at IWV09: International Workshop on Video, In Barcelona, SPAIN, May 25-27, 2009.

(Slides, NO Video)

Abstract

I will present a variety of temporal models of video that we have been studying (and developing on) for analysis and synthesis of video. Forsynthesis of videos, we have been developing representations that support example-based re-synthesis and spatio-temporal re-targeting. These approaches build on graph-based methods and we present techniques for similarity metrics for video, segmentation in video, and merging of different video streams. I will showcase a series of examples of these approaches applied to generate new videos.

For analysis of videos, we have developed a series of representations to observe and model activities in videos. Building on low-level measures of movement and motion in videos, we have incorporated higher-level temporal generative models to represent and recognize observed activities. I will discuss the strengths of a variety of State-based, Markovian, Grammar-based and Network-based representations that we have employed for recognizing activities from video. I will also discuss approaches for unsupervised discovery and recognition of activities.

Time permitting, I will describe some new efforts that move towards understanding mobile imaging and video, and video authoring and video on the web, Within these I will discuss issues of collaborative imaging, collective authoring, ad-hoc sensor networks, and peer production with images and videos. Using these concepts, to focus the conversation, I will discuss how all of these issues are impacting the field Journalism and Reporting and how we have started on a new interdisciplinary research and education effort, we call Computational Journalism.

AddThis Social Bookmark Button

Presentation at CMU’s Computational Thinking Seminar Series (2009): “From Computational Photography and Video to Computational Journalism”

March 10th, 2009 Irfan Essa Posted in Computational Journalism, Computational Photography and Video, Presentations 1 Comment »

From Computational Photography and Video to Computational Journalism

Irfan Essa
Georgia Institute of Technology
School of Interactive Computing, GVU and RIM Centers
April 21, 2009.

(see the video of this presentation)

Abstract

essa_poster_b

Our consumption of images (photography/video) continues to grow with the pervasiveness of computing (networking, mobile and media) technologies into our daily lives. Everyone now has a mobile camera, and digital image capture, processing, and sharing has become ubiquitous in our society. This has led to a significant impact on we want to (a) create novel scenes, (b) share our experiences with images, and (c) interact with  large amounts of images and videos from many sources. In this talk, I will start  with a brief overview of series of ongoing efforts in the analysis of images and videos for rendering novel scenes, interacting with images/videos and collaboratively authoring new content. I will describe some work on video-based rendering and synthesizing novel videos (and scenes) and highlight the technical contributions being made in areas of Computational Photography and Video.

Using these sets of efforts as a foundation I will showcase where things are headed in terms of user generated content, media sharing, annotation, and reuse with large scale networks. In essence, everybody is a content, producer, distributor, and consumer. I will describe some new efforts that move towards understanding mobile imaging and video, and also discuss issues of collaborative imaging, collective authoring, ad-hoc sensor networks, and peer production with images and videos.  Using these concepts I will discuss how all of these issues are impacting the field Journalism and Reporting and how we have started on a new interdisciplinary research and education effort, we call Computational Journalism.  The concept of Computational Journalism includes more than just imaging, and relates to media and information in general and is aimed at the study of how we remain informed in this connected world. I will outline this new field and relate it back to imaging, with examples from some of our recent work in this new area.

AddThis Social Bookmark Button