MENU: Home Bio Affiliations Research Teaching Publications Videos Collaborators/Students Contact FAQ ©2007-14 RSS

William Mong Distinguished Lecture at the University of Hong Kong on “Video Cameras are Everywhere: Data-Driven Methods for Video Analysis and Enhancement”

December 11th, 2014 Irfan Essa Posted in Computational Photography and Video, Computer Vision, Presentations No Comments »

Video Cameras are Everywhere: Data-Driven Methods for Video Analysis and Enhancement

Irfan Essa (prof.irfanessa.com)
Georgia Institute of Technology
School of Interactive Computing
GVU and RIM @ GT Centers 

Abstract 

2014-12-11-HKUIn this talk, I will start with describing the pervasiveness of image and video content, and how such content is growing with the ubiquity of cameras.  I will use this to motivate the need for better tools for analysis and enhancement of video content. I will start with some of our earlier work on temporal modeling of video, then lead up to some of our current work and describe two main projects. (1) Our approach for a video stabilizer, currently implemented and running on YouTube, and its extensions. (2) A robust and scaleable method for video segmentation. 

I will describe, in some detail, our Video stabilization method, which generates stabilized videos and is in wide use. Our method allows for video stabilization beyond the conventional filtering that only suppresses high frequency jitter. This method also supports removal of rolling shutter distortions common in modern CMOS cameras that capture the frame one scan-line at a time resulting in non-rigid image distortions such as shear and wobble. Our method does not rely on a-priori knowledge and works on video from any camera or on legacy footage. I will showcase examples of this approach and also discuss how this method is launched and running on YouTube, with Millions of users.

Then I will  describe an efficient and scalable technique for spatio-temporal segmentation of long video sequences using a hierarchical graph-based algorithm. This hierarchical approach generates high quality segmentations and we demonstrate the use of this segmentation as users interact with the video, enabling efficient annotation of objects within the video. I will also show some recent work on how this segmentation and annotation can be used to do dynamic scene understanding. 

Bio: http://prof.irfanessa.com/bio 

AddThis Social Bookmark Button

Paper in BMCV (2014): “Depth Extraction from Videos Using Geometric Context and Occlusion Boundaries”

September 5th, 2014 Irfan Essa Posted in Computational Photography and Video, PAMI/ICCV/CVPR/ECCV, S. Hussain Raza No Comments »

  • S. H. Raza, O. Javed, A. Das, H. Sawhney, H. Cheng, and I. Essa (2014), “Depth Extraction from Videos Using Geometric Context and Occlusion Boundaries Depth Extraction from Videos Using Geometric Context and Occlusion Boundaries,” in Proceedings of British Machine Vision Conference (BMVC), Nottingham, UK, 2014. [PDF] [WEBSITE] [BIBTEX]
    @inproceedings{2014-Raza-DEFVUGCOBDEFVUGCOB,
      Address = {Nottingham, UK},
      Author = {Syed Hussain Raza and Omar Javed and Aveek Das and Harpreet Sawhney and Hui Cheng and Irfan Essa},
      Booktitle = {{Proceedings of British Machine Vision Conference (BMVC)}},
      Date-Added = {2014-08-30 12:56:03 +0000},
      Date-Modified = {2014-11-10 16:10:07 +0000},
      Month = {September},
      Pdf = {http://www.cc.gatech.edu/~irfan/p/2014-Raza-DEFVUGCOBDEFVUGCOB.pdf},
      Title = {Depth Extraction from Videos Using Geometric Context and Occlusion Boundaries Depth Extraction from Videos Using Geometric Context and Occlusion Boundaries},
      Url = {http://www.cc.gatech.edu/cpl/projects/videodepth/},
      Year = {2014},
      Bdsk-Url-1 = {http://www.cc.gatech.edu/cpl/projects/videodepth/}}

We present an algorithm to estimate depth in dynamic video scenes.We present an algorithm to estimate depth in dynamic video scenes.

We propose to learn and infer depth in videos from appearance, motion, occlusion boundaries, and geometric context of the scene. Using our method, depth can be estimated from unconstrained videos with no requirement of camera pose estimation, and with significant background/foreground motions. We start by decomposing a video into spatio-temporal regions. For each spatio-temporal region, we learn the relationship of depth to visual appearance, motion, and geometric classes. Then we infer the depth information of new scenes using piecewise planar parametrization estimated within a Markov random field (MRF) framework by combining appearance to depth learned mappings and occlusion boundary guided smoothness constraints. Subsequently, we perform temporal smoothing to obtain temporally consistent depth maps.

To evaluate our depth estimation algorithm, we provide a novel dataset with ground truth depth for outdoor video scenes. We present a thorough evaluation of our algorithm on our new dataset and the publicly available Make3d static image dataset.

AddThis Social Bookmark Button

PhD Thesis (2014) by S. Hussain Raza “Temporally Consistent Semantic Segmentation in Videos

May 2nd, 2014 Irfan Essa Posted in Computational Photography and Video, PhD, S. Hussain Raza No Comments »

Title : Temporally Consistent Semantic Segmentation in Videos

S. Hussain Raza, Ph. D. Candidate in ECE (https://sites.google.com/site/shussainraza5/)

Committee:

Prof. Irfan Essa (advisor), School of Interactive Computing
Prof. David Anderson (co-advisor), School of Electrical and Computer Engineering
Prof. Frank Dellaert, School of Interactive Computing
Prof. Anthony Yezzi, School of Electrical and Computer Engineering
Prof. Chris Barnes, School of Electrical and Computer Enginnering
Prof. Rahul Sukthanker, Department of Computer Science and Robotics, Carnegie Mellon University.

Abstract :

The objective of this Thesis research is to develop algorithms for temporally consistent semantic segmentation in videos. Though many different forms of semantic segmentations exist, this research is focused on the problem of temporally-consistent holistic scene understanding in outdoor videos. Holistic scene understanding requires an understanding of many individual aspects of the scene including 3D layout, objects present, occlusion boundaries, and depth. Such a description of a dynamic scene would be useful for many robotic applications including object reasoning, 3D perception, video analysis, video coding, segmentation, navigation and activity recognition.

Scene understanding has been studied with great success for still images. However, scene understanding in videos requires additional approaches to account for the temporal variation, dynamic information, and exploiting causality. As a first step, image-based scene understanding methods can be directly applied to individual video frames to generate a description of the scene. However, these methods do not exploit temporal information across neighboring frames. Further, lacking temporal consistency, image-based methods can result in temporally-inconsistent labels across frames. This inconsistency can impact performance, as scene labels suddenly change between frames.

The objective of our this study is to develop temporally consistent scene descriptive algorithms by processing videos efficiently, exploiting causality and data-redundancy, and cater for scene dynamics. Specifically, we achieve our research objects by (1) extracting geometric context from videos to give broad 3D structure of the scene with all objects present, (2) detecting occlusion boundaries in videos due to depth discontinuity, and (3) estimating depth in videos by combining monocular and motion features with semantic features and occlusion boundaries.

AddThis Social Bookmark Button

Two Ph. D. Defenses the same day. A first for me!

April 2nd, 2014 Irfan Essa Posted in Activity Recognition, Computational Photography and Video, Health Systems, PhD, S. Hussain Raza, Students, Yachna Sharma No Comments »

Today, two of my Ph. D. Students defended their Dissertations.  Back to back.  Congrats to both as they are both done.

Thesis title: Surgical Skill Assessment Using Motion Texture analysis
Student: Yachna Sharma, Ph. D. Candidate in ECE
http://users.ece.gatech.edu/~ysharma3/
Date/Time : 2nd April, 1:00 pm

Title : Temporally Consistent Semantic Segmentation in Videos
S. Hussain Raza, Ph. D. Candidate in ECE
https://sites.google.com/site/shussainraza5/
Date/Time : 2nd April, 1:00 pm

Location : CSIP Library, Room 5186, CenterGy One Building

 

AddThis Social Bookmark Button

Paper in CVIU 2013 “A Visualization Framework for Team Sports Captured using Multiple Static Cameras”

October 3rd, 2013 Irfan Essa Posted in Activity Recognition, Computational Photography and Video, Jessica Hodgins, PAMI/ICCV/CVPR/ECCV, Papers, Raffay Hamid, Sports Visualization No Comments »

  • R. Hamid, R. Kumar, J. Hodgins, and I. Essa (2013), “A Visualization Framework for Team Sports Captured using Multiple Static Cameras,” Computer Vision and Image Understanding, p. -, 2013. [PDF] [WEBSITE] [VIDEO] [DOI] [BIBTEX]
    @article{2013-Hamid-VFTSCUMSC,
      Author = {Raffay Hamid and Ramkrishan Kumar and Jessica Hodgins and Irfan Essa},
      Date-Added = {2013-10-22 13:42:46 +0000},
      Date-Modified = {2014-04-28 17:09:21 +0000},
      Doi = {10.1016/j.cviu.2013.09.006},
      Issn = {1077-3142},
      Journal = {{Computer Vision and Image Understanding}},
      Number = {0},
      Pages = {-},
      Pdf = {http://www.cc.gatech.edu/~irfan/p/2013-Hamid-VFTSCUMSC.pdf},
      Title = {A Visualization Framework for Team Sports Captured using Multiple Static Cameras},
      Url = {http://raffayhamid.com/sports_viz.shtml},
      Video = {http://www.youtube.com/watch?v=VwzAMi9pUDQ},
      Year = {2013},
      Bdsk-Url-1 = {http://www.sciencedirect.com/science/article/pii/S1077314213001768},
      Bdsk-Url-2 = {http://dx.doi.org/10.1016/j.cviu.2013.09.006},
      Bdsk-Url-3 = {http://raffayhamid.com/sports_viz.shtml}}

Abstract

We present a novel approach for robust localization of multiple people observed using a set of static cameras. We use this location information to generate a visualization of the virtual offside line in soccer games. To compute the position of the offside line, we need to localize players′ positions, and identify their team roles. We solve the problem of fusing corresponding players′ positional information by finding minimum weight K-length cycles in a complete K-partite graph. Each partite of the graph corresponds to one of the K cameras, whereas each node of a partite encodes the position and appearance of a player observed from a particular camera. To find the minimum weight cycles in this graph, we use a dynamic programming based approach that varies over a continuum from maximally to minimally greedy in terms of the number of graph-paths explored at each iteration. We present proofs for the efficiency and performance bounds of our algorithms. Finally, we demonstrate the robustness of our framework by testing it on 82,000 frames of soccer footage captured over eight different illumination conditions, play types, and team attire. Our framework runs in near-real time, and processes video from 3 full HD cameras in about 0.4 seconds for each set of corresponding 3 frames.

via Science Direct A Visualization Framework for Team Sports Captured using Multiple Static Cameras.

AddThis Social Bookmark Button

Paper in ACM Ubicomp 2013 “Technological approaches for addressing privacy concerns when recognizing eating behaviors with wearable cameras”

September 14th, 2013 Irfan Essa Posted in Activity Recognition, Computational Photography and Video, Edison Thomaz, Gregory Abowd, ISWC, Mobile Computing, Papers, Ubiquitous Computing No Comments »

  • E. Thomaz, A. Parnami, J. Bidwell, I. Essa, and G. D. Abowd (2013), “Technological Approaches for Addressing Privacy Concerns when Recognizing Eating Behaviors with Wearable Cameras.,” in Proceedings of the ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp, 2013. [PDF] [DOI] [BIBTEX]
    @inproceedings{2013-Thomaz-TAAPCWREBWWC,
      Author = {Edison Thomaz and Aman Parnami and Jonathan Bidwell and Irfan Essa and Gregory D. Abowd},
      Booktitle = {{Proceedings of the ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp}},
      Date-Added = {2013-10-22 18:31:23 +0000},
      Date-Modified = {2014-04-28 17:07:56 +0000},
      Doi = {10.1145/2493432.2493509},
      Month = {September},
      Pdf = {http://www.cc.gatech.edu/~irfan/p/2013-Thomaz-TAAPCWREBWWC.pdf},
      Title = {Technological Approaches for Addressing Privacy Concerns when Recognizing Eating Behaviors with Wearable Cameras.},
      Year = {2013},
      Bdsk-Url-1 = {http://dx.doi.org/10.1145/2493432.2493509}}

 Abstract

First-person point-of-view (FPPOV) images taken by wearable cameras can be used to better understand people’s eating habits. Human computation is a way to provide effective analysis of FPPOV images in cases where algorithmic approaches currently fail. However, privacy is a serious concern. We provide a framework, the privacy-saliency matrix, for understanding the balance between the eating information in an image and its potential privacy concerns. Using data gathered by 5 participants wearing a lanyard-mounted smartphone, we show how the framework can be used to quantitatively assess the effectiveness of four automated techniques (face detection, image cropping, location filtering and motion filtering) at reducing the privacy-infringing content of images while still maintaining evidence of eating behaviors throughout the day.

via ACM DL Technological approaches for addressing privacy concerns when recognizing eating behaviors with wearable cameras.

AddThis Social Bookmark Button

At ICVSS (International Computer Vision Summer School) 2013, in Calabria, ITALY (July 2013)

July 11th, 2013 Irfan Essa Posted in Computational Photography, Computational Photography and Video, Daniel Castro, Matthias Grundmann, Presentations, S. Hussain Raza, Vivek Kwatra No Comments »

Teaching at the ICVSS 2013, in Calabria, Italy, July 2013 (Programme)

Computational Video: Post-processing Methods for Stabilization, Retargeting and Segmentation

Irfan Essa
(This work in collaboration with
Matthias Grundmann, Daniel Castro, Vivek Kwatra, Mei Han, S. Hussian Raza).

Abstract

We address a variety of challenges for analysis and enhancement of Computational Video. We present novel post-processing methods to bridge the difference between professional and casually shot videos mostly seen on online sites. Our research presents solutions to three well-defined problems: (1) Video stabilization and rolling shutter removal in casually-shot, uncalibrated videos; (2) Content-aware video retargeting; and (3) spatio-temporal video segmentation to enable efficient video annotation. We showcase several real-world applications building on these techniques.

We start by proposing a novel algorithm for video stabilization that generates stabilized videos by employing L1-optimal camera paths to remove undesirable motions. We compute camera paths that are optimally partitioned into con- stant, linear and parabolic segments mimicking the camera motions employed by professional cinematographers. To achieve this, we propose a linear program- ming framework to minimize the first, second, and third derivatives of the result- ing camera path. Our method allows for video stabilization beyond conventional filtering, that only suppresses high frequency jitter. An additional challenge in videos shot from mobile phones are rolling shutter distortions. Modern CMOS cameras capture the frame one scanline at a time, which results in non-rigid image distortions such as shear and wobble. We propose a solution based on a novel mixture model of homographies parametrized by scanline blocks to correct these rolling shutter distortions. Our method does not rely on a-priori knowl- edge of the readout time nor requires prior camera calibration. Our novel video stabilization and calibration free rolling shutter removal have been deployed on YouTube where they have successfully stabilized millions of videos. We also discuss several extensions to the stabilization algorithm and present technical details behind the widely used YouTube Video Stabilizer.

We address the challenge of changing the aspect ratio of videos, by proposing algorithms that retarget videos to fit the form factor of a given device without stretching or letter-boxing. Our approaches use all of the screens pixels, while striving to deliver as much video-content of the original as possible. First, we introduce a new algorithm that uses discontinuous seam-carving in both space and time for resizing videos. Our algorithm relies on a novel appearance-based temporal coherence formulation that allows for frame-by-frame processing and results in temporally discontinuous seams, as opposed to geometrically smooth and continuous seams. Second, we present a technique, that builds on the above mentioned video stabilization approach. We effectively automate classical pan and scan techniques by smoothly guiding a virtual crop window via saliency constraints.

Finally, we introduce an efficient and scalable technique for spatio-temporal segmentation of long video sequences using a hierarchical graph-based algorithm. We begin by over-segmenting a volumetric video graph into space-time regions grouped by appearance. We then construct a region graph over the ob- tained segmentation and iteratively repeat this process over multiple levels to create a tree of spatio-temporal segmentations. This hierarchical approach gen- erates high quality segmentations, and allows subsequent applications to choose from varying levels of granularity. We demonstrate the use of spatio-temporal segmentation as users interact with the video, enabling efficient annotation of objects within the video.

Part of this talks will will expose attendees to use the Video Stabilizer on YouTube and the video segmentation system at videosegmentation.com. Please find appropriate videos to test the systems.

Part of the work described above was done at Google, where Matthias Grundmann, Vivek Kwatra and Mei Han are, and Professor Essa is working as a Consultant. Part of the work were efforts of research by Matthias Grundmann, Daniel Castro and S. Hussain Raza, as part of their research efforts as students at GA Tech.

AddThis Social Bookmark Button

Google I/O 2013: Secrets of Video Stabilization on YouTube

May 28th, 2013 Irfan Essa Posted in Computational Photography and Video, Google, In The News, Matthias Grundmann, Presentations, Vivek Kwatra 1 Comment »

Presentation at Google I/0 2013 by Matthias Grundmann, John Gregg, and Vivek Kwatra on our Video Stabilizer on YouTube

Video stabilization is a key component of YouTubes video enhancement tools and youtube.com/editor. All YouTube uploads are automatically detected for shakiness and suggested stabilization if needed. This talk will describe the technical details behind our fully automatic one-click stabilization technology, including aspects such as camera path optimization, rolling shutter detection and removal, distributed computing for real-time previews, and camera shake detection. More info: http://googleresearch.blogspot.com/2012/05/video-stabilization-on-youtube.html

via Secrets of Video Stabilization on YouTube — Google I/O 2013.

AddThis Social Bookmark Button

Computational Photography MOOC on Coursera, comes to a close.

May 7th, 2013 Irfan Essa Posted in Computational Photography, Computational Photography and Video, Coursera, Denis Lantsman No Comments »

The Computational Photography MOOC offering in Coursera came to a close with the following final announcement (abridged here) on May 7, 2013.

Computational photographers:

Thanks for joining us for an engaging 5 weeks of collaboratively learning the wonderful aspects of computational photography. We bid you all farewell now and hope to see some of you in a future reincarnation of this class, building on the feedback provided by many of you. Keep a lookout for the repeat of the same class, and for another class continuing to more advanced topics.

Final graded scores, and the the certificate of completion will be made available this week. All assignment solutions are available, as requested. We will also keep the class site open for a while.

Do remember that we still welcome your feedback, so use the forums. If you haven’t done so already, please do spend a few minutes to fill out the survey for the last week of class, which is part of a survey we are conducting to understand and evaluate online classes offerings like this one.

Again, thanks for participating, and good luck with your future endeavors. And remember to take good pictures, and to have fun computing with photographs.

via Announcements | Computational Photography.

AddThis Social Bookmark Button

Paper in ICCP 2013 “Post-processing approach for radiometric self-calibration of video”

April 19th, 2013 Irfan Essa Posted in Computational Photography and Video, ICCP, Matthias Grundmann, Papers, Sing Bing Kang No Comments »

  • M. Grundmann, C. McClanahan, S. B. Kang, and I. Essa (2013), “Post-processing Approach for Radiometric Self-Calibration of Video,” in Proceedings of IEEE International Conference on Computational Photography (ICCP), 2013. [PDF] [WEBSITE] [VIDEO] [DOI] [BIBTEX]
    @inproceedings{2013-Grundmann-PARSV,
      Author = {Matthias Grundmann and Chris McClanahan and Sing Bing Kang and Irfan Essa},
      Booktitle = {{Proceedings of IEEE International Conference on Computational Photography (ICCP)}},
      Date-Added = {2013-06-25 11:54:57 +0000},
      Date-Modified = {2014-04-28 17:09:49 +0000},
      Doi = {10.1109/ICCPhot.2013.6528307},
      Month = {April},
      Organization = {IEEE Computer Society},
      Pdf = {http://www.cc.gatech.edu/~irfan/p/2013-Grundmann-PARSV.pdf},
      Title = {Post-processing Approach for Radiometric Self-Calibration of Video},
      Url = {http://www.cc.gatech.edu/cpl/projects/radiometric},
      Video = {http://www.youtube.com/watch?v=sC942ZB4WuM},
      Year = {2013},
      Bdsk-Url-1 = {http://www.cc.gatech.edu/cpl/projects/radiometric},
      Bdsk-Url-2 = {http://dx.doi.org/10.1109/ICCPhot.2013.6528307}}

Abstract

We present a novel data-driven technique for radiometric self-calibration of video from an unknown camera. Our approach self-calibrates radiometric variations in video, and is applied as a post-process; there is no need to access the camera, and in particular it is applicable to internet videos. This technique builds on empirical evidence that in video the camera response function (CRF) should be regarded time variant, as it changes with scene content and exposure, instead of relying on a single camera response function. We show that a time-varying mixture of responses produces better accuracy and consistently reduces the error in mapping intensity to irradiance when compared to a single response model. Furthermore, our mixture model counteracts the effects of possible nonlinear exposure-dependent intensity perturbations and white-balance changes caused by proprietary camera firmware. We further show how radiometrically calibrated video improves the performance of other video analysis algorithms, enabling a video segmentation algorithm to be invariant to exposure and gain variations over the sequence. We validate our data-driven technique on videos from a variety of cameras and demonstrate the generality of our approach by applying it to internet video.

via IEEE Xplore – Post-processing approach for radiometric self-calibration of video.

AddThis Social Bookmark Button