MENU: Home Bio Affiliations Research Teaching Publications Videos Collaborators/Students Contact FAQ ©2007-14 RSS

William Mong Distinguished Lecture at the University of Hong Kong on “Video Cameras are Everywhere: Data-Driven Methods for Video Analysis and Enhancement”

December 11th, 2014 Irfan Essa Posted in Computational Photography and Video, Computer Vision, Presentations | No Comments »

Video Cameras are Everywhere: Data-Driven Methods for Video Analysis and Enhancement

Irfan Essa (prof.irfanessa.com)
Georgia Institute of Technology
School of Interactive Computing
GVU and RIM @ GT Centers 

Abstract 

2014-12-11-HKUIn this talk, I will start with describing the pervasiveness of image and video content, and how such content is growing with the ubiquity of cameras.  I will use this to motivate the need for better tools for analysis and enhancement of video content. I will start with some of our earlier work on temporal modeling of video, then lead up to some of our current work and describe two main projects. (1) Our approach for a video stabilizer, currently implemented and running on YouTube, and its extensions. (2) A robust and scaleable method for video segmentation. 

I will describe, in some detail, our Video stabilization method, which generates stabilized videos and is in wide use. Our method allows for video stabilization beyond the conventional filtering that only suppresses high frequency jitter. This method also supports removal of rolling shutter distortions common in modern CMOS cameras that capture the frame one scan-line at a time resulting in non-rigid image distortions such as shear and wobble. Our method does not rely on a-priori knowledge and works on video from any camera or on legacy footage. I will showcase examples of this approach and also discuss how this method is launched and running on YouTube, with Millions of users.

Then I will  describe an efficient and scalable technique for spatio-temporal segmentation of long video sequences using a hierarchical graph-based algorithm. This hierarchical approach generates high quality segmentations and we demonstrate the use of this segmentation as users interact with the video, enabling efficient annotation of objects within the video. I will also show some recent work on how this segmentation and annotation can be used to do dynamic scene understanding. 

Bio: http://prof.irfanessa.com/bio 

Tags: , , ,

AddThis Social Bookmark Button

Computation + Journalism Symposium 2014

October 25th, 2014 Irfan Essa Posted in Computational Journalism, Events, Nick Diakopoulos | No Comments »

Hosted the 3rd Computation + Journalism Symposium 2014 at The Brown Institute for Media Innovation in the Pulitzer Hall, Columbia University, New York, NY, USA, on October 24-25. It was a huge success with about 250 attendees, and mixture of invited panels and contributed papers.  More details below:

Jon Klienberg kicked off the meeting with a very exciting keynote.  Videos of all sessions should be available from the above website.  Next C+J event will be in a year. Stay tuned for more details.  I was the co-organizer of this event with Nick Diakopoulos and Mark Hansen.

 

 

Tags: ,

AddThis Social Bookmark Button

Paper in BMCV (2014): “Depth Extraction from Videos Using Geometric Context and Occlusion Boundaries”

September 5th, 2014 Irfan Essa Posted in Computational Photography and Video, PAMI/ICCV/CVPR/ECCV, S. Hussain Raza | No Comments »

  • S. H. Raza, O. Javed, A. Das, H. Sawhney, H. Cheng, and I. Essa (2014), “Depth Extraction from Videos Using Geometric Context and Occlusion Boundaries Depth Extraction from Videos Using Geometric Context and Occlusion Boundaries,” in Proceedings of British Machine Vision Conference (BMVC), Nottingham, UK, 2014. [PDF] [WEBSITE] [BIBTEX]
    @inproceedings{2014-Raza-DEFVUGCOBDEFVUGCOB,
      Address = {Nottingham, UK},
      Author = {Syed Hussain Raza and Omar Javed and Aveek Das and Harpreet Sawhney and Hui Cheng and Irfan Essa},
      Booktitle = {{Proceedings of British Machine Vision Conference (BMVC)}},
      Date-Added = {2014-08-30 12:56:03 +0000},
      Date-Modified = {2014-11-10 16:10:07 +0000},
      Month = {September},
      Pdf = {http://www.cc.gatech.edu/~irfan/p/2014-Raza-DEFVUGCOBDEFVUGCOB.pdf},
      Title = {Depth Extraction from Videos Using Geometric Context and Occlusion Boundaries Depth Extraction from Videos Using Geometric Context and Occlusion Boundaries},
      Url = {http://www.cc.gatech.edu/cpl/projects/videodepth/},
      Year = {2014},
      Bdsk-Url-1 = {http://www.cc.gatech.edu/cpl/projects/videodepth/}}

We present an algorithm to estimate depth in dynamic video scenes.We present an algorithm to estimate depth in dynamic video scenes.

We propose to learn and infer depth in videos from appearance, motion, occlusion boundaries, and geometric context of the scene. Using our method, depth can be estimated from unconstrained videos with no requirement of camera pose estimation, and with significant background/foreground motions. We start by decomposing a video into spatio-temporal regions. For each spatio-temporal region, we learn the relationship of depth to visual appearance, motion, and geometric classes. Then we infer the depth information of new scenes using piecewise planar parametrization estimated within a Markov random field (MRF) framework by combining appearance to depth learned mappings and occlusion boundary guided smoothness constraints. Subsequently, we perform temporal smoothing to obtain temporally consistent depth maps.

To evaluate our depth estimation algorithm, we provide a novel dataset with ground truth depth for outdoor video scenes. We present a thorough evaluation of our algorithm on our new dataset and the publicly available Make3d static image dataset.

Tags: , , ,

AddThis Social Bookmark Button

Paper in CVPR 2014 “Efficient Hierarchical Graph-Based Segmentation of RGBD Videos”

June 22nd, 2014 Irfan Essa Posted in Computer Vision, Henrik Christensen, Papers, Steven Hickson | No Comments »

  • S. Hickson, S. Birchfield, I. Essa, and H. Christensen (2014), “Efficient Hierarchical Graph-Based Segmentation of RGBD Videos,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014. [PDF] [WEBSITE] [BIBTEX]
    @inproceedings{2014-Hickson-EHGSRV,
      Author = {Steven Hickson and Stan Birchfield and Irfan Essa and Henrik Christensen},
      Booktitle = {{Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}},
      Date-Added = {2014-06-22 14:44:17 +0000},
      Date-Modified = {2014-06-22 14:53:26 +0000},
      Month = {June},
      Organization = {IEEE Computer Society},
      Pdf = {http://www.cc.gatech.edu/~irfan/p/2014-Hickson-EHGSRV.pdf},
      Title = {Efficient Hierarchical Graph-Based Segmentation of RGBD Videos},
      Url = {http://www.cc.gatech.edu/cpl/projects/4dseg},
      Year = {2014},
      Bdsk-Url-1 = {http://www.cc.gatech.edu/cpl/projects/4dseg}}

Abstract

We present an efficient and scalable algorithm for seg- menting 3D RGBD point clouds by combining depth, color, and temporal information using a multistage, hierarchical graph-based approach. Our algorithm processes a moving window over several point clouds to group similar regions over a graph, resulting in an initial over-segmentation. These regions are then merged to yield a dendrogram using agglomerative clustering via a minimum spanning tree algorithm. Bipartite graph matching at a given level of the hierarchical tree yields the final segmentation of the point clouds by maintaining region identities over arbitrarily long periods of time. We show that a multistage segmentation with depth then color yields better results than a linear combination of depth and color. Due to its incremental process- ing, our algorithm can process videos of any length and in a streaming pipeline. The algorithm’s ability to produce robust, efficient segmentation is demonstrated with numerous experimental results on challenging sequences from our own as well as public RGBD data sets.

Tags: , , , , ,

AddThis Social Bookmark Button

PhD Thesis (2014) by Yachna Sharma “Surgical Skill Assessment Using Motion Texture analysis”

May 2nd, 2014 Irfan Essa Posted in Medical, PhD, Yachna Sharma | No Comments »

Thesis title: Surgical Skill Assessment Using Motion Texture analysis

Yachna Sharma, Ph. D. Candidate, ECE
http://users.ece.gatech.edu/~ysharma3/

Committee:

Prof. Irfan Essa (advisor), College of Computing
Prof. Mark A. Clements (co-advisor), School of Electrical and Computer Engineering
Prof. David Anderson, School of Electrical and Computer Engineering
Prof. Anthony Yezzi, School of Electrical and Computer Engineering
Prof. Christopher F. Barnes, School of Electrical and Computer Engineering
Dr. Thomas Ploetz, Culture lab, School of Computing Science, Newcastle University, United Kingdom
Dr. Eric L. Sarin, Division of Cardiothoracic Surgery, Department of Surgery, Emory University School of Medicine

Abstract:

The objective of this Ph.D. research is to design and develop a framework for automated assessment of surgical skills.Automated assessment can help expedite the manual assessment process and provide unbiased evaluations with possible dexterity feedback.

Evaluation of surgical skills is an important aspect in training of medical students. Current practices rely on manual evaluations from faculty and residents and are time consuming. Proposed solutions in literature involve retrospective evaluations such as watching the offline videos. It requires precious time and attention of expert surgeons and may vary from one surgeon to another. With recent advancements in computer vision and machine learning techniques, the retrospective video evaluation can be best delegated to the computer algorithms.

Skill assessment is a challenging task requiring expert domain knowledge that may be difficult to translate into algorithms. To emulate this human observation process, an appropriate data collection mechanism is required to track motion of the surgeon’s hand in an unrestricted manner. In addition, it is essential to identify skill defining motion dynamics and skill relevant hand locations.

This Ph.D. research aims to address the limitations of manual skill assessment by developing an automated motion analysis framework. Specifically, we propose (1) to design and implement quantitative features to capture fine motion details from surgical video data, (2) to identify and test the efficacy of a core subset of features in classifying the surgical students into different expertise levels, (3) to derive absolute skill scores using regression methods and (4) to perform dexterity analysis using motion data from different hand locations.

Tags: , , , ,

AddThis Social Bookmark Button

PhD Thesis (2014) by S. Hussain Raza “Temporally Consistent Semantic Segmentation in Videos

May 2nd, 2014 Irfan Essa Posted in Computational Photography and Video, PhD, S. Hussain Raza | No Comments »

Title : Temporally Consistent Semantic Segmentation in Videos

S. Hussain Raza, Ph. D. Candidate in ECE (https://sites.google.com/site/shussainraza5/)

Committee:

Prof. Irfan Essa (advisor), School of Interactive Computing
Prof. David Anderson (co-advisor), School of Electrical and Computer Engineering
Prof. Frank Dellaert, School of Interactive Computing
Prof. Anthony Yezzi, School of Electrical and Computer Engineering
Prof. Chris Barnes, School of Electrical and Computer Enginnering
Prof. Rahul Sukthanker, Department of Computer Science and Robotics, Carnegie Mellon University.

Abstract :

The objective of this Thesis research is to develop algorithms for temporally consistent semantic segmentation in videos. Though many different forms of semantic segmentations exist, this research is focused on the problem of temporally-consistent holistic scene understanding in outdoor videos. Holistic scene understanding requires an understanding of many individual aspects of the scene including 3D layout, objects present, occlusion boundaries, and depth. Such a description of a dynamic scene would be useful for many robotic applications including object reasoning, 3D perception, video analysis, video coding, segmentation, navigation and activity recognition.

Scene understanding has been studied with great success for still images. However, scene understanding in videos requires additional approaches to account for the temporal variation, dynamic information, and exploiting causality. As a first step, image-based scene understanding methods can be directly applied to individual video frames to generate a description of the scene. However, these methods do not exploit temporal information across neighboring frames. Further, lacking temporal consistency, image-based methods can result in temporally-inconsistent labels across frames. This inconsistency can impact performance, as scene labels suddenly change between frames.

The objective of our this study is to develop temporally consistent scene descriptive algorithms by processing videos efficiently, exploiting causality and data-redundancy, and cater for scene dynamics. Specifically, we achieve our research objects by (1) extracting geometric context from videos to give broad 3D structure of the scene with all objects present, (2) detecting occlusion boundaries in videos due to depth discontinuity, and (3) estimating depth in videos by combining monocular and motion features with semantic features and occlusion boundaries.

Tags: , , ,

AddThis Social Bookmark Button

PhD Thesis by Zahoor Zafrulla “Automatic recognition of American Sign Language Classifiers

May 2nd, 2014 Irfan Essa Posted in Affective Computing, Behavioral Imaging, Face and Gesture, PhD, Thad Starner, Zahoor Zafrulla | No Comments »

Title: Automatic recognition of American Sign Language Classifiers

Zahoor Zafrulla
School of Interactive Computing
College of Computing
Georgia Institute of Technology
http://www.cc.gatech.edu/grads/z/zahoor/

Committee:

Dr. Thad Starner (Advisor, School of Interactive Computing, Georgia Tech)
Dr. Irfan Essa (Co-Advisor, School of Interactive Computing, Georgia Tech)
Dr. Jim Rehg (School of Interactive Computing, Georgia Tech)
Dr. Harley Hamilton (School of Interactive Computing, Georgia Tech)
Dr. Vassilis Athitsos (Computer Science and Engineering Department, University of Texas at Arlington)

Summary:

Automatically recognizing classifier-based grammatical structures of American Sign Language (ASL) is a challenging problem. Classifiers in ASL utilize surrogate hand shapes for people or “classes” of objects and provide information about their location, movement and appearance. In the past researchers have focused on recognition of finger spelling, isolated signs, facial expressions and interrogative words like WH-questions (e.g. Who, What, Where, and When). Challenging problems such as recognition of ASL sentences and classifier-based grammatical structures remain relatively unexplored in the field of ASL recognition.

One application of recognition of classifiers is toward creating educational games to help young deaf children acquire language skills. Previous work developed CopyCat, an educational ASL game that requires children to engage in a progressively more difficult expressive signing task as they advance through the game.

We have shown that by leveraging context we can use verification, in place of recognition, to boost machine performance for determining if the signed responses in an expressive signing task, like in the CopyCat game, are correct or incorrect. We have demonstrated that the quality of a machine verifier’s ability to identify the boundary of the signs can be improved by using a novel two-pass technique that combines signed input in both forward and reverse directions. Additionally, we have shown that we can reduce CopyCat’s dependency on custom manufactured hardware by using an off-the-shelf Microsoft Kinect depth camera to achieve similar verification performance. Finally, we show how we can extend our ability to recognize sign language by leveraging depth maps to develop a method using improved hand detection and hand shape classification to recognize selected classifier-based grammatical structures of ASL.

Tags: , , ,

AddThis Social Bookmark Button

Paper in IBSI 2014 conference entitled “Automated Surgical OSATS Prediction from Videos”

April 28th, 2014 Irfan Essa Posted in Behavioral Imaging, Health Systems, Medical, Papers, Thomas Ploetz, Yachna Sharma | No Comments »

  • Y. Sharma, T. Ploetz, N. Hammerla, S. Mellor, R. McNaney, P. Oliver, S. Deshmukh, A. McCaskie, and I. Essa (2014), “Automated Surgical OSATS Prediction from Videos,” in Proceedings of IEEE International Symposium on Biomedical Imaging, Beijing, CHINA, 2014. [PDF] [BIBTEX]
    @inproceedings{2014-Sharma-ASOPFV,
      Address = {Beijing, CHINA},
      Author = {Yachna Sharma and Thomas Ploetz and Nils Hammerla and Sebastian Mellor and Roisin McNaney and Patrick Oliver and Sandeep Deshmukh and Andrew McCaskie and Irfan Essa},
      Booktitle = {{Proceedings of IEEE International Symposium on Biomedical Imaging}},
      Date-Added = {2014-04-28 16:51:07 +0000},
      Date-Modified = {2014-04-28 17:07:29 +0000},
      Month = {April},
      Pdf = {http://www.cc.gatech.edu/~irfan/p/2014-Sharma-ASOPFV.pdf},
      Title = {Automated Surgical {OSATS} Prediction from Videos},
      Year = {2014}}

Abstract

The assessment of surgical skills is an essential part of medical training. The prevalent manual evaluations by expert surgeons are time consuming and often their outcomes vary substantially from one observer to another. We present a video-based framework for automated evaluation of surgical skills based on the Objective Structured Assessment of Technical Skills (OSATS) criteria. We encode the motion dynamics via frame kernel matrices, and represent the motion granularity by texture features. Linear discriminant analysis is used to derive a reduced dimensionality feature space followed by linear regression to predict OSATS skill scores. We achieve statistically significant correlation (p-value < 0.01) between the ground-truth (given by domain experts) and the OSATS scores predicted by our framework.

Tags: , , ,

AddThis Social Bookmark Button

Computational Journalist Nick Diakopoulos Appointed Assistant Professor at Philip Merrill College of Journalism, U of Maryland

April 2nd, 2014 Irfan Essa Posted in Computational Journalism, In The News, Nick Diakopoulos | No Comments »

Congratulations to my Ph. D. Student Nicholas Diakopoulos and best wishes on his new position.

COLLEGE PARK, Md. – Computational journalist Nicholas A. Diakopoulos will be the newest assistant professor at the Philip Merrill College of Journalism. Dean Lucy Dalglish announced the appointment today.

….

With a background in computer science and human-computer interaction, Diakopoulos received his Ph.D. from the School of Interactive Computing at Georgia Tech.  He was also a computing innovation fellow at the School of Communication and Information at Rutgers University from 2009-2011.

via Computational Journalist Nick Diakopoulos Appointed Assistant Professor.

Tags: , ,

AddThis Social Bookmark Button

Two Ph. D. Defenses the same day. A first for me!

April 2nd, 2014 Irfan Essa Posted in Activity Recognition, Computational Photography and Video, Health Systems, PhD, S. Hussain Raza, Students, Yachna Sharma | No Comments »

Today, two of my Ph. D. Students defended their Dissertations.  Back to back.  Congrats to both as they are both done.

Thesis title: Surgical Skill Assessment Using Motion Texture analysis
Student: Yachna Sharma, Ph. D. Candidate in ECE
http://users.ece.gatech.edu/~ysharma3/
Date/Time : 2nd April, 1:00 pm

Title : Temporally Consistent Semantic Segmentation in Videos
S. Hussain Raza, Ph. D. Candidate in ECE
https://sites.google.com/site/shussainraza5/
Date/Time : 2nd April, 1:00 pm

Location : CSIP Library, Room 5186, CenterGy One Building

 

Tags: , , , ,

AddThis Social Bookmark Button