MENU: Home Bio Affiliations Research Teaching Publications Videos Collaborators/Students Contact FAQ ©2007-14 RSS

Paper in IEEE CVPR 2013 “Decoding Children’s Social Behavior”

June 27th, 2013 Irfan Essa Posted in Affective Computing, Behavioral Imaging, Denis Lantsman, Gregory Abowd, James Rehg, PAMI/ICCV/CVPR/ECCV, Papers, Thomas Ploetz | No Comments »

  • J. M. Rehg, G. D. Abowd, A. Rozga, M. Romero, M. A. Clements, S. Sclaroff, I. Essa, O. Y. Ousley, Y. Li, C. Kim, H. Rao, J. C. Kim, L. L. Presti, J. Zhang, D. Lantsman, J. Bidwell, and Z. Ye (2013), “Decoding Children’s Social Behavior,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013. [PDF] [WEBSITE] [DOI] [BIBTEX]
      Author = {James M. Rehg and Gregory D. Abowd and Agata Rozga and Mario Romero and Mark A. Clements and Stan Sclaroff and Irfan Essa and Opal Y. Ousley and Yin Li and Chanho Kim and Hrishikesh Rao and Jonathan C. Kim and Liliana Lo Presti and Jianming Zhang and Denis Lantsman and Jonathan Bidwell and Zhefan Ye},
      Booktitle = {Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
      Date-Added = {2013-06-25 11:47:42 +0000},
      Date-Modified = {2013-10-22 18:50:31 +0000},
      Doi = {10.1109/CVPR.2013.438},
      Month = {June},
      Organization = {IEEE Computer Society},
      Pdf = {},
      Title = {Decoding Children's Social Behavior},
      Url = {},
      Year = {2013},
      Bdsk-Url-1 = {},
      Bdsk-Url-2 = {}}


We introduce a new problem domain for activity recognition: the analysis of children’s social and communicative behaviors based on video and audio data. We specifically target interactions between children aged 1-2 years and an adult. Such interactions arise naturally in the diagnosis and treatment of developmental disorders such as autism. We introduce a new publicly-available dataset containing over 160 sessions of a 3-5 minute child-adult interaction. In each session, the adult examiner followed a semi-structured play interaction protocol which was designed to elicit a broad range of social behaviors. We identify the key technical challenges in analyzing these behaviors, and describe methods for decoding the interactions. We present experimental results that demonstrate the potential of the dataset to drive interesting research questions, and show preliminary results for multi-modal activity recognition.

Full database available from

via IEEE Xplore – Decoding Children’s Social Behavior.

Tags: , , , , , ,

AddThis Social Bookmark Button

Paper in IEEE CVPR 2013 “Augmenting Bag-of-Words: Data-Driven Discovery of Temporal and Structural Information for Activity Recognition”

June 27th, 2013 Irfan Essa Posted in Activity Recognition, Behavioral Imaging, Grant Schindler, PAMI/ICCV/CVPR/ECCV, Papers, Sports Visualization, Thomas Ploetz, Vinay Bettadapura | No Comments »

  • V. Bettadapura, G. Schindler, T. Ploetz, and I. Essa (2013), “Augmenting Bag-of-Words: Data-Driven Discovery of Temporal and Structural Information for Activity Recognition,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013. [PDF] [WEBSITE] [DOI] [BIBTEX]
      Author = {Vinay Bettadapura and Grant Schindler and Thomas Ploetz and Irfan Essa},
      Booktitle = {Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
      Date-Added = {2013-06-25 11:42:31 +0000},
      Date-Modified = {2013-10-22 18:39:15 +0000},
      Doi = {10.1109/CVPR.2013.338},
      Month = {June},
      Organization = {IEEE Computer Society},
      Pdf = {},
      Title = {Augmenting Bag-of-Words: Data-Driven Discovery of Temporal and Structural Information for Activity Recognition},
      Url = {},
      Year = {2013},
      Bdsk-Url-1 = {},
      Bdsk-Url-2 = {}}


We present data-driven techniques to augment Bag of Words (BoW) models, which allow for more robust modeling and recognition of complex long-term activities, especially when the structure and topology of the activities are not known a priori. Our approach specifically addresses the limitations of standard BoW approaches, which fail to represent the underlying temporal and causal information that is inherent in activity streams. In addition, we also propose the use of randomly sampled regular expressions to discover and encode patterns in activities. We demonstrate the effectiveness of our approach in experimental evaluations where we successfully recognize activities and detect anomalies in four complex datasets.

via IEEE Xplore – Augmenting Bag-of-Words: Data-Driven Discovery of Temporal and Structural Information for Activity R….

Tags: , , , , ,

AddThis Social Bookmark Button

Google I/O 2013: Secrets of Video Stabilization on YouTube

May 28th, 2013 Irfan Essa Posted in Computational Photography and Video, Google, In The News, Matthias Grundmann, Presentations, Vivek Kwatra | No Comments »

Presentation at Google I/0 2013 by Matthias Grundmann, John Gregg, and Vivek Kwatra on our Video Stabilizer on YouTube

Video stabilization is a key component of YouTubes video enhancement tools and All YouTube uploads are automatically detected for shakiness and suggested stabilization if needed. This talk will describe the technical details behind our fully automatic one-click stabilization technology, including aspects such as camera path optimization, rolling shutter detection and removal, distributed computing for real-time previews, and camera shake detection. More info:

via Secrets of Video Stabilization on YouTube — Google I/O 2013.

Tags: , ,

AddThis Social Bookmark Button

Summer in Barcelona. Teaching classes and such.

May 13th, 2013 Irfan Essa Posted in CnJ, Computational Photography, Daniel Castro, Study Abroad | No Comments »

I am here in Barcelona for my 4th summer, participating in the Georgia Tech, College of Computing’s International Study Abroad Programs in Barcelona, ESP. Teaching two classes and spending time with 64 student participants, 4 teaching assistants, and 6 faculty from GA Tech.  Spending time with Faculty at Facultat d’Informàtica de Barcelona – UPC, our hosts here in Barcelona.

  • CS 4464: Computational Journalism: This class is aimed at understanding the computational and technological advancements in the area of journalism. Primary focus is on the study of technologies for developing new tools for (a) sense-making from diverse news information sources, (b) the impact of more and cheaper networked sensors (c) collaborative human models for information aggregation and sense-making, (d) mashups and the use of programming in journalism, (e) the impact of mobile computing and data gathering, (f) computational approaches to information quality, (g) data mining for personalization and aggregation, and (h) citizen journalism. 
  • CS 4475: Computational Photography: This class explores perceptual and technical aspects of pictures, and more precisely the capture and depiction of reality on a 2D medium. The scientific, perceptual, and artistic principles behind image-making will be emphasized. Topics include the relationship between pictorial techniques and the human visual system; intrinsic limitations of 2D representations and their possible compensations; and technical issues involving depiction. Technical aspects of image capture and rendering, and exploration of how such a medium can be used to its maximum potential, will be examined. Material from recent coursera offering of Computational Photography will be leveraged this class.

Tags: , , , , ,

AddThis Social Bookmark Button

Computational Photography MOOC on Coursera, comes to a close.

May 7th, 2013 Irfan Essa Posted in Computational Photography, Computational Photography and Video, Coursera, Denis Lantsman | No Comments »

The Computational Photography MOOC offering in Coursera came to a close with the following final announcement (abridged here) on May 7, 2013.

Computational photographers:

Thanks for joining us for an engaging 5 weeks of collaboratively learning the wonderful aspects of computational photography. We bid you all farewell now and hope to see some of you in a future reincarnation of this class, building on the feedback provided by many of you. Keep a lookout for the repeat of the same class, and for another class continuing to more advanced topics.

Final graded scores, and the the certificate of completion will be made available this week. All assignment solutions are available, as requested. We will also keep the class site open for a while.

Do remember that we still welcome your feedback, so use the forums. If you haven’t done so already, please do spend a few minutes to fill out the survey for the last week of class, which is part of a survey we are conducting to understand and evaluate online classes offerings like this one.

Again, thanks for participating, and good luck with your future endeavors. And remember to take good pictures, and to have fun computing with photographs.

via Announcements | Computational Photography.

Tags: , , , , , ,

AddThis Social Bookmark Button

Paper in AISTATS 2013 “Beyond Sentiment: The Manifold of Human Emotions”

April 29th, 2013 Irfan Essa Posted in AAAI/IJCAI/UAI, Behavioral Imaging, Computational Journalism, Numerical Machine Learning, Papers, WWW | No Comments »

  • S. Kim, F. Li, G. Lebanon, and I. A. Essa (2013), “Beyond Sentiment: The Manifold of Human Emotions,” in Proceedings of AI STATS, 2013. [PDF] [BIBTEX]
      Author = {Seungyeon Kim and Fuxin Li and Guy Lebanon and Irfan A. Essa},
      Booktitle = {Proceedings of AI STATS},
      Date-Added = {2013-06-25 12:01:11 +0000},
      Date-Modified = {2013-06-25 12:02:53 +0000},
      Pdf = {},
      Title = {Beyond Sentiment: The Manifold of Human Emotions},
      Year = {2013}}


Sentiment analysis predicts the presence of positive or negative emotions in a text document. In this paper we consider higher dimensional extensions of the sentiment concept, which represent a richer set of human emotions. Our approach goes beyond previous work in that our model contains a continuous manifold rather than a finite set of human emotions. We investigate the resulting model, compare it to psychological observations, and explore its predictive capabilities. Besides obtaining significant improvements over a baseline without manifold, we are also able to visualize different notions of positive sentiment in different domains.

via [ 1202.1568] Beyond Sentiment: The Manifold of Human Emotions.

Tags: , , , , ,

AddThis Social Bookmark Button

Paper in ICCP 2013 “Post-processing approach for radiometric self-calibration of video”

April 19th, 2013 Irfan Essa Posted in Computational Photography and Video, ICCP, Matthias Grundmann, Papers, Sing Bing Kang | No Comments »

  • M. Grundmann, C. McClanahan, S. B. Kang, and I. Essa (2013), “Post-processing Approach for Radiometric Self-Calibration of Video,” in Proceedings of IEEE International Conference on Computational Photography, 2013. [PDF] [WEBSITE] [VIDEO] [DOI] [BIBTEX]
      Author = {Matthias Grundmann and Chris McClanahan and Sing Bing Kang and Irfan Essa},
      Booktitle = {Proceedings of IEEE International Conference on Computational Photography},
      Date-Added = {2013-06-25 11:54:57 +0000},
      Date-Modified = {2013-10-22 18:41:09 +0000},
      Doi = {10.1109/ICCPhot.2013.6528307},
      Month = {April},
      Organization = {IEEE Computer Society},
      Pdf = {},
      Title = {Post-processing Approach for Radiometric Self-Calibration of Video},
      Url = {},
      Video = {},
      Year = {2013},
      Bdsk-Url-1 = {},
      Bdsk-Url-2 = {}}


We present a novel data-driven technique for radiometric self-calibration of video from an unknown camera. Our approach self-calibrates radiometric variations in video, and is applied as a post-process; there is no need to access the camera, and in particular it is applicable to internet videos. This technique builds on empirical evidence that in video the camera response function (CRF) should be regarded time variant, as it changes with scene content and exposure, instead of relying on a single camera response function. We show that a time-varying mixture of responses produces better accuracy and consistently reduces the error in mapping intensity to irradiance when compared to a single response model. Furthermore, our mixture model counteracts the effects of possible nonlinear exposure-dependent intensity perturbations and white-balance changes caused by proprietary camera firmware. We further show how radiometrically calibrated video improves the performance of other video analysis algorithms, enabling a video segmentation algorithm to be invariant to exposure and gain variations over the sequence. We validate our data-driven technique on videos from a variety of cameras and demonstrate the generality of our approach by applying it to internet video.

via IEEE Xplore – Post-processing approach for radiometric self-calibration of video.

Tags: , , , ,

AddThis Social Bookmark Button

Coursera Course on Computational Photography, NOW LIVE and RUNNING!

March 23rd, 2013 Irfan Essa Posted in Computational Photography, Coursera, Denis Lantsman | 4 Comments »

We are live!

Welcome to the course website!

As you take a look around, there are a few things we want to bring to your attention:

1. Do note that while this is an introductory class, we do require that you have working knowledge of college level mathematics, which includes concepts like Linear Algebra and Calculus. In addition, the programming assignments will require access to a computer with Python and OpenCV. Instructions for installing this software are available here, which is also linked in the navigation bar of the site. The responsibility to get your computer system working with this software is entirely yours. The diverse nature of computer platforms and softwares do offer challenges in setting up such systems and we encourage folks to use the forums to help each other with challenges faced.Please get started with the software installation as soon as possible!

2. These online courses are still rather mysterious, we want to learn as much as we can about them What works? What doesnt?. Accordingly, we invite you to participate in occasional surveys that are part of the research study. The surveys will tell us about you, and how the course is working for you. The information you provide will help us make the course better and improve our ability to provide more intellectually engaging content. Participation in the surveys is completely optional and will not affect your grade in the course in any way. For your reference, here is a link to a PDF document Consent Form that describes the surveys and study in more detail.If you would like to participate, begin by filling out the background survey here. Also, keep an eye on the syllabus page and the weekly announcements for end-of-week surveys.

3. The syllabus page will be updated every week to reflect all of the course content as it becomes available. Please consult that page if you are unsure about what to do next. The page also contains useful information about class logistics, policies, and frequently asked questions.

4. Note that the lecture videos page contains links to the subtitles, slides, and video file downloads for each of the lecture videos.Thank you for your attention, and I hope you will have fun in the following few weeks!


via Announcements | Computational Photography.

Computational Photography via Coursera

Tags: , , ,

AddThis Social Bookmark Button

Matthias Grundmann’s PhD Thesis Defense (2013): “Title: Computational Video: Post-processing Methods for Stabilization, Retargeting and Segmentation”

February 4th, 2013 Irfan Essa Posted in Computational Photography and Video, Matthias Grundmann, PhD | No Comments »

Title: Computational Video: Post-processing Methods for Stabilization, Retargeting and Segmentation

Matthias Grundmann
School of Interactive Computing
College of Computing
Georgia Institute of Technology

Date: February 04, 2013 (Monday)
Time: 3:00p – 6:00p EST
Location: Nano building, 116-118



In this thesis, we address a variety of challenges for analysis and enhancement of Computational Video. We present novel post-processing methods to bridge the difference between professional and casually shot videos mostly seen on online sites. Our research presents solutions to three well-defined problems: (1) Video stabilization and rolling shutter removal in casually-shot, uncalibrated videos; (2) Content-aware video retargeting; and (3) spatio-temporal video segmentation to enable efficient video annotation. We showcase several real-world applications building on these techniques.

We start by proposing a novel algorithm for video stabilization that generates stabilized videos by employing L1-optimal camera paths to remove undesirable motions. We compute camera paths that are optimally partitioned into constant, linear and parabolic segments mimicking the camera motions employed by professional cinematographers. To achieve this, we propose a linear programming framework to minimize the first, second, and third derivatives of the resulting camera path. Our method allows for video stabilization beyond conventional filtering, that only suppresses high frequency jitter. An additional challenge in videos shot from mobile phones are rolling shutter distortions. Modern CMOS cameras capture the frame one scanline at a time, which results in non-rigid image distortions such as shear and wobble. We propose a solution based on a novel mixture model of homographies parametrized by scanline blocks to correct these rolling shutter distortions. Our method does not rely on a-priori knowledge of the readout time nor requires prior camera calibration. Our novel video stabilization and calibration free rolling shutter removal have been deployed on YouTube where they have successfully stabilized millions of videos. We also discuss several extensions to the stabilization algorithm and present technical details behind the widely used YouTube Video Stabilizer.

We address the challenge of changing the aspect ratio of videos, by proposing algorithms that retarget videos to fit the form factor of a given device without stretching or letter-boxing. Our approaches use all of the screen’s pixels, while striving to deliver as much video-content of the original as possible. First, we introduce a new algorithm that uses discontinuous seam-carving in both space and time for resizing videos. Our algorithm relies on a novel appearance-based temporal coherence formulation that allows for frame-by-frame processing and results in temporally discontinuous seams, as opposed to geometrically smooth and continuous seams. Second, we present a technique, that builds on the above mentioned video stabilization approach. We effectively automate classical pan and scan techniques by smoothly guiding a virtual crop window via saliency constraints.

Finally, we introduce an efficient and scalable technique for spatio-temporal segmentation of long video sequences using a hierarchical graph-based algorithm. We begin by over-segmenting a volumetric video graph into space-time regions grouped by appearance. We then construct a “region graph” over the obtained  segmentation and iteratively repeat this process over multiple levels to create a tree of spatio-temporal segmentations. This hierarchical approach generates high quality segmentations, and allows subsequent applications to choose from varying levels of granularity. We demonstrate the use of spatio-temporal segmentation as users interact with the video, enabling efficient annotation of objects within the video.


  • Dr. Irfan Essa (Advisor, School of Interactive Computing, Georgia Tech)
  • Dr. Jim Rehg (School of Interactive Computing, Georgia Tech)
  • Dr. Frank Dellaert (School of Interactive Computing, Georgia Tech)
  • Dr. Michael Black (Perceiving Systems Department, Max Planck Institute for Intelligent Systems)
  • Dr. Sing Bing Kang (Adjunct Faculty, Georgia Tech; Microsoft Research, Microsoft Corp.)
  • Dr. Vivek Kwatra (Google Research, Google Inc.)

Tags: , , , ,

AddThis Social Bookmark Button

Videos from the Computational Journalism Symposium (Jan 31 – Feb 1, 2013).

February 1st, 2013 Irfan Essa Posted in Computational Journalism, Events, Presentations | No Comments »

The Computation + Journalism Symposium 2013, held Jan 31 – Feb 1, 2013, at Georgia Institute of Technology, Atlanta, GA, USA was a huge success. Please see the videos here of all the sessions. See me discuss computational journalism with Phil Meyer, and my slides and take-away points from the closing session.

Tags: ,

AddThis Social Bookmark Button