MENU: Home Bio Affiliations Research Teaching Publications Collaborators/Students Calendar Contact FAQ ©2007-12 RSS

Presentation to the New/Incoming Graduate Students at the College of Computing (August 2011).

August 18th, 2011 Irfan Essa Posted in Presentations No Comments »

AddThis Social Bookmark Button

Presentation (2011) at IBPRIA 2011: “Spatio-Temporal Video Analysis and Visual Activity Recognition”

June 8th, 2011 Irfan Essa Posted in Activity Recognition, Computational Photography and Video, Kihwan Kim, Matthias Grundmann, Multimedia, PAMI/ICCV/CVPR/ECCV, Presentations No Comments »

“Spatio-Temporal Video Analysis and Visual Activity Recognition” at the Iberian Conference on Pattern Recognition and Image Analysis  (IbPRIA) 2011 Conference in Las Palmas de Gran Canaria. Spain. June 8-10.

Abstract

My research group is focused on a variety of approaches for (a) low-level video analysis and synthesis and (b) recognizing activities in videos. In this talk, I will concentrate on two of our recent efforts. One effort aimed at robust spatio-temporal segmentation of video and another on using motion and flow to recognize and predict actions from video.

In the first part of the talk, I will present an efficient and scalable technique for spatio-temporal segmentation of long video sequences using a hierarchical graph-based algorithm. In this work, we begin by over segmenting a volumetric video graph into space-time regions grouped by appearance. We then construct a “region graph” over the obtained segmentation and iteratively repeat this process over multiple levels to create a tree of spatio-temporal segmentations. This hierarchical approach generates high quality segmentations, which are temporally coherent with stable region boundaries, and allows subsequent applications to choose from varying levels of granularity. We further improve segmentation quality by using dense optical flow to guide temporal connections in the initial graph. I will demonstrate a variety of examples of how this robust segmentation works, and will show additional examples of video-retargeting that use spatio-temporal saliency derived from this segmentation approach. (Matthias Grundmann, Vivek Kwatra, Mei Han, Irfan Essa, CVPR 2010, in collaboration with Google Research).

In the second part of this talk, I will show that constrained multi-agent events can be analyzed and even predicted from video. Such analysis requires estimating the global movements of all players in the scene at any time, and is needed for modeling and predicting how the multi-agent play evolves over time on the playing field. To this end, we propose a novel approach to detect the locations of where the play evolution will proceed, e.g. where interesting events will occur, by tracking player positions and movements over time. To achieve this, we extract the ground level sparse movement of players in each time-step, and then generate a dense motion field. Using this field we detect locations where the motion converges, implying positions towards which the play is evolving. I will show examples of how we have tested this approach for soccer, basketball and hockey. (Kihwan Kim, Matthias Grundmann, Ariel Shamir, Iain Matthews, Jessica Hodgins, Irfan Essa, CVPR 2010, in collaboration with Disney Research).

Time permitting, I will show some more videos of our recent work on video analysis and synthesis. For more information, papers, and videos, see my website.

AddThis Social Bookmark Button

Fall 2010 GRASP Seminar – Irfan Essa, Georgia Institute Of Technology, “Two Talks On Video Analysis: 1 Segmentation Of Video And 2 Prediction Of Actions In Video” | GRASP Laboratory – University Of Pennsylvania

September 20th, 2010 Irfan Essa Posted in Computational Photography and Video, Presentations No Comments »

Fall 2010 GRASP Seminar – Irfan Essa, Georgia Institute Of Technology, “Two Talks On Video Analysis: 1 Segmentation Of Video And 2 Prediction Of Actions In Video” | GRASP Laboratory – University Of Pennsylvania.

Friday September 24, 2010 from 11:00am to 12:00pm

My research group is focused on a variety if approaches for video analysis and synthesis. In this talk, I will focus on two of our recent efforts.  One effort aimed at robust spatio-temporal segmentation of video and another on using motion and flow to predict actions from video.

In the first part of the talk, I will present an efficient and scalable technique for spatio-temporal segmentation of long video sequences using a hierarchical graph-based algorithm. In this effort, we begin by over segmenting a volumetric video graph into space-time regions grouped by appearance. We then construct a “region graph” over the obtained segmentation and iteratively repeat this process over multiple levels to create a tree of spatio-temporal segmentations. This hierarchical approach generates high quality segmentations, which are temporally coherent with stable region boundaries, and allows subsequent applications to choose from varying levels of granularity. We further improve segmentation quality by using dense optical flow to guide temporal connections in the initial graph. I will demonstrate a variety of examples of how this robust segmentation works, and will show additional examples of video-retargeting that use the saliency from this segmentation approach.  (Matthias Grundmann, Vivek Kwatra, Mei Han, Irfan Essa, CVPR 2010, in collaboration with Google Research).

In the second part of this talk, I will show that constrained multi-agent events can be analyzed and even predicted from video. Such analysis requires estimating the global movements of all players in the scene at any time, and is needed for modeling and predicting how the multi-agent play evolves over time on the field. To this end, we propose a novel approach to detect the locations of where the play evolution will proceed, e.g. where interesting events will occur, by tracking player positions and movements over time. To achieve this, we extract the ground level sparse movement of players in each time-step, and then generate a dense motion field. Using this field we detect locations where the motion converges, implying positions towards which the play is evolving. I will show examples of how we have tested this approach for soccer, basketball and hockey. (Kihwan Kim, Matthias Grundmann, Ariel Shamir, Iain Matthews, Jessica Hodgins, Irfan Essa, CVPR 2010, in collaboration with Disney Research).

Time permitting, I will show some more videos of our recent work on video analysis and synthesis. For more information, papers, and videos, see my website athttp://prof.irfanessa.com/
Presenter’s Biography:

Irfan Essa is a Professor in the School of Interactive Computing(iC) of the College of Computing (CoC), and Adjunct Professor in the School of Electrical and Computer Engineering, Georgia Institute of Technology (GA Tech), in Atlanta, Georgia, USA.

Irfan Essa works in the areas of Computer Vision, Computer Graphics, Computational Perception, Robotics and Computer Animation, with potential impact on Video Analysis and Production (e.g., Computational Photography & Video, Image-based Modeling and Rendering, etc.) Human Computer Interaction, and Artificial Intelligence research. Specifically, he is interested in the analysis, interpretation, authoring, and synthesis (of video), with the goals of building aware environments, recognizing, modeling human activities, and behaviors, and developing dynamic and generative representations of time-varying streams. He has published over a 150 scholarly articles in leading journals and conference venues on these topics and has awards for his research and teaching.

He joined Georgia Tech Faculty in 1996 after his earning his MS (1990), Ph.D. (1994), and holding research faculty position at the Massachusetts Institute of Technology (Media Lab) [1988-1996]. His Doctoral Research was in the area of Facial Recognition, Analysis, and Synthesis.

AddThis Social Bookmark Button

Presentation at International Workshop on Video (2009): “Temporal Representations of Video for Analysis and Synthesis”

May 26th, 2009 Irfan Essa Posted in Computational Photography and Video, Presentations No Comments »

“Temporal Representations of Video for Analysis and Synthesis” at IWV09: International Workshop on Video, In Barcelona, SPAIN, May 25-27, 2009.

(Slides, NO Video)

Abstract

I will present a variety of temporal models of video that we have been studying (and developing on) for analysis and synthesis of video. Forsynthesis of videos, we have been developing representations that support example-based re-synthesis and spatio-temporal re-targeting. These approaches build on graph-based methods and we present techniques for similarity metrics for video, segmentation in video, and merging of different video streams. I will showcase a series of examples of these approaches applied to generate new videos.

For analysis of videos, we have developed a series of representations to observe and model activities in videos. Building on low-level measures of movement and motion in videos, we have incorporated higher-level temporal generative models to represent and recognize observed activities. I will discuss the strengths of a variety of State-based, Markovian, Grammar-based and Network-based representations that we have employed for recognizing activities from video. I will also discuss approaches for unsupervised discovery and recognition of activities.

Time permitting, I will describe some new efforts that move towards understanding mobile imaging and video, and video authoring and video on the web, Within these I will discuss issues of collaborative imaging, collective authoring, ad-hoc sensor networks, and peer production with images and videos. Using these concepts, to focus the conversation, I will discuss how all of these issues are impacting the field Journalism and Reporting and how we have started on a new interdisciplinary research and education effort, we call Computational Journalism.

AddThis Social Bookmark Button

Presentation at CMU’s Computational Thinking Seminar Series (2009): “From Computational Photography and Video to Computational Journalism”

March 10th, 2009 Irfan Essa Posted in Computational Journalism, Computational Photography and Video, Presentations 1 Comment »

From Computational Photography and Video to Computational Journalism

Irfan Essa
Georgia Institute of Technology
School of Interactive Computing, GVU and RIM Centers
April 21, 2009.

(see the video of this presentation)

Abstract

essa_poster_b

Our consumption of images (photography/video) continues to grow with the pervasiveness of computing (networking, mobile and media) technologies into our daily lives. Everyone now has a mobile camera, and digital image capture, processing, and sharing has become ubiquitous in our society. This has led to a significant impact on we want to (a) create novel scenes, (b) share our experiences with images, and (c) interact with  large amounts of images and videos from many sources. In this talk, I will start  with a brief overview of series of ongoing efforts in the analysis of images and videos for rendering novel scenes, interacting with images/videos and collaboratively authoring new content. I will describe some work on video-based rendering and synthesizing novel videos (and scenes) and highlight the technical contributions being made in areas of Computational Photography and Video.

Using these sets of efforts as a foundation I will showcase where things are headed in terms of user generated content, media sharing, annotation, and reuse with large scale networks. In essence, everybody is a content, producer, distributor, and consumer. I will describe some new efforts that move towards understanding mobile imaging and video, and also discuss issues of collaborative imaging, collective authoring, ad-hoc sensor networks, and peer production with images and videos.  Using these concepts I will discuss how all of these issues are impacting the field Journalism and Reporting and how we have started on a new interdisciplinary research and education effort, we call Computational Journalism.  The concept of Computational Journalism includes more than just imaging, and relates to media and information in general and is aimed at the study of how we remain informed in this connected world. I will outline this new field and relate it back to imaging, with examples from some of our recent work in this new area.

AddThis Social Bookmark Button

Presentation at Duke University (2009): “Computation & Journalism: The Impact of Technology on Journalism, Information Quality, and Civic Literacy”

January 10th, 2009 Irfan Essa Posted in Computational Journalism, Presentations 2 Comments »

Talk/Presentation at Duke University, Jan 27, 2009. Hosted by  James Hamilton, director of the DeWitt Wallace Center for Media and Democracy at Duke University

Computation & Journalism: The Impact of Technology on Journalism, Information Quality, and Civic Literacy

Irfan Essa
Georgia Institute of Technology
School of Interactive Computing, GVU and RIM Centers 

Fundamentally, journalism is the process of collecting news information and disseminating that information with a layer of contextualization and understanding provided by journalists in the form of a news story. Recent advances in computational technology are rapidly affecting how news is gathered, reported, and distributed, and how stories are authored and told. New technologies for aggregating, visualizing, summarizing, consuming, and collaborating on news are becoming increasingly popular. Theses advances are challenging the traditional practices of journalism and directly affecting the future of news production and consumption. Both computation and journalism share a deep interest in information and the value it provides to society, and they are deeply involved in the future of storytelling in various contexts, especially current events. This requires us to consider how both Computation and Journalism can help each other.

In this talk, I will present a vision for a new area of research and education that brings together the fields of computation and journalism together to enhance both these disciplines and supports a creation of a “Computationalist-Journalist.,” a new kind of participant in the public conversation. I will start by describing how imaging, video, and media production and consumption has changed with technology and then how similar technologies can be used for Journalism and related Civic Literacy issues. I will describe new technologies that have changed the landscape of both Computation and Journalism and use these developments to showcase, where we are headed to with both Computation and Journalism, and technologists and journalists together to create new computing tools that further the aims of journalism.

Bio

AddThis Social Bookmark Button

Presentation: At Qualcomm Research in San Diego, CA (2008) “From Computational Photography and Video to Computational Journalism”

September 24th, 2008 Irfan Essa Posted in Computational Journalism, Computational Photography and Video, Presentations No Comments »

From Computational Photography and Video to Computational Journalism

 
Abstract

Digital image capture, processing, and sharing has become pervasive in our society. This has had significant impact on how we create novel scenes, how we share our experiences, and how we interact with images and videos. In this talk, I will present an overview of series of ongoing efforts in the analysis of images and videos for rendering novel scenes. First I will discuss (in brief) our work on Video Textures, where repeating information is extracted to generate extended sequences of videos. I will also describe some our extensions to this approach that allows for controlled generation of animations of video sprites. We have developed various learning and optimization techniques that allow for video-based animations of photo-realistic characters. Using these sets of approaches as a foundation, then I will show how new images and videos can be generated. I will show examples of Photorealistic and Non-photorealistic Renderings of Scenes (Videos and Images) and how these methods support the media reuse culture, so common these days with user generated content.   I will then describe some of our new efforts that move towards understanding mobile imaging and video, and also discuss issues of collaborative imaging and authoring and ad-hoc sensor networks, and peer production with images and videos, leading to a new concepts of how computation has impacted journalism. Time permitting, I will also share some of our efforts on video annotation and how we have taken some of these new concepts of video analysis to classrooms.

AddThis Social Bookmark Button

Presentation: CETEE (2007): “Computational Photography & Video: Research & Education”

October 30th, 2007 Irfan Essa Posted in Presentations, Research, Teaching No Comments »

I was invited to participate and present at the CETEE 2007, Islamabad, November 27-28, 2007.

This meeting has recently been postponed.

AddThis Social Bookmark Button

Presentation: U of Maryland: “Computational Photography and Video: Spatio Temporal Analysis for Synthesis”

September 25th, 2007 Irfan Essa Posted in Computational Photography and Video, Presentations No Comments »

Computational Photography and Video: Spatio Temporal Analysis for Synthesis of Novel Images and Videos.

ABSTRACT

Digital image capture and processing has recently had a significant impact on the computer graphics quest for rendering novel scenes. In this talk, I will present an overview of series of ongoing efforts in the analysis of images and videos for rendering novel scenes. First I will discuss (in brief) our work on Video Textures, where repeating information is extracted to generate extended sequences of videos. I will then describe some our extensions to this approach that allows for controlled generation of animations of video sprites. We have developed various learning and optimization techniques that allow for video-based animations of photo-realistic characters. Then I will describe additional approaches for image and video synthesis that builds on optimal patch-based copying of samples. I will show how our methods allow for iterative refinement, with a variety of optimization criteria, and all for extension to synthesis of both images and video from very limited samples. Using these sets of approaches as a foundation, then I will show how new images and videos can be generated. I will show examples of Photorealistic and Non-photorealistic Renderings of Scenes (Videos and Images) and how these methods support the media reuse culture, so common these days with user generated content. Time permitting, I will also share some of our efforts on video annotation and how we have taken some of these new concepts of video analysis to undergraduate classrooms.

AddThis Social Bookmark Button

Talk: Keynote at WIAMIS 2007 “Data-driven and Procedural Analysis and Synthesis of Multimedia”

June 14th, 2007 Irfan Essa Posted in Computational Photography and Video, Presentations No Comments »

WIAMIS 2007: “Data-driven and Procedural Analysis and Synthesis of Multimedia”

Abstract

In this talk, I will outline the changes that have come about in the analysis and synthesis of multimedia, due to the availability of large amounts of data. I will present several of the recently successful methods that have been introduced in the last few years for example-based synthesis for animation and rendering of videos. I will also show how these methods have been extended to other modalities. I will also show how these approaches need to be extended by developing parametric and procedurals models to represent temporal variations. Using example from my groups work and also other efforts, I will discuss how video is becoming an accessible medium for all and I will also discuss some newer work on authoring of multimedia content.

AddThis Social Bookmark Button