MENU: Home Bio Affiliations Research Teaching Publications Videos Collaborators/Students Contact FAQ ©2007-15 RSS

Paper in IBSI 2014 conference entitled “Automated Surgical OSATS Prediction from Videos”

April 28th, 2014 Irfan Essa Posted in Behavioral Imaging, Health Systems, Medical, Papers, Thomas Ploetz, Yachna Sharma | No Comments »

  • Y. Sharma, T. Ploetz, N. Hammerla, S. Mellor, R. McNaney, P. Oliver, S. Deshmukh, A. McCaskie, and I. Essa (2014), “Automated Surgical OSATS Prediction from Videos,” in Proceedings of IEEE International Symposium on Biomedical Imaging, Beijing, CHINA, 2014. [PDF] [BIBTEX]
      Address = {Beijing, CHINA},
      Author = {Yachna Sharma and Thomas Ploetz and Nils Hammerla and Sebastian Mellor and Roisin McNaney and Patrick Oliver and Sandeep Deshmukh and Andrew McCaskie and Irfan Essa},
      Booktitle = {{Proceedings of IEEE International Symposium on Biomedical Imaging}},
      Date-Added = {2014-04-28 16:51:07 +0000},
      Date-Modified = {2014-04-28 17:07:29 +0000},
      Month = {April},
      Pdf = {},
      Title = {Automated Surgical {OSATS} Prediction from Videos},
      Year = {2014}}


The assessment of surgical skills is an essential part of medical training. The prevalent manual evaluations by expert surgeons are time consuming and often their outcomes vary substantially from one observer to another. We present a video-based framework for automated evaluation of surgical skills based on the Objective Structured Assessment of Technical Skills (OSATS) criteria. We encode the motion dynamics via frame kernel matrices, and represent the motion granularity by texture features. Linear discriminant analysis is used to derive a reduced dimensionality feature space followed by linear regression to predict OSATS skill scores. We achieve statistically significant correlation (p-value < 0.01) between the ground-truth (given by domain experts) and the OSATS scores predicted by our framework.

Tags: , , ,

AddThis Social Bookmark Button

Computational Journalist Nick Diakopoulos Appointed Assistant Professor at Philip Merrill College of Journalism, U of Maryland

April 2nd, 2014 Irfan Essa Posted in Computational Journalism, In The News, Nick Diakopoulos | No Comments »

Congratulations to my Ph. D. Student Nicholas Diakopoulos and best wishes on his new position.

COLLEGE PARK, Md. – Computational journalist Nicholas A. Diakopoulos will be the newest assistant professor at the Philip Merrill College of Journalism. Dean Lucy Dalglish announced the appointment today.


With a background in computer science and human-computer interaction, Diakopoulos received his Ph.D. from the School of Interactive Computing at Georgia Tech.  He was also a computing innovation fellow at the School of Communication and Information at Rutgers University from 2009-2011.

via Computational Journalist Nick Diakopoulos Appointed Assistant Professor.

Tags: , ,

AddThis Social Bookmark Button

Two Ph. D. Defenses the same day. A first for me!

April 2nd, 2014 Irfan Essa Posted in Activity Recognition, Computational Photography and Video, Health Systems, PhD, S. Hussain Raza, Students, Yachna Sharma | No Comments »

Today, two of my Ph. D. Students defended their Dissertations.  Back to back.  Congrats to both as they are both done.

Thesis title: Surgical Skill Assessment Using Motion Texture analysis
Student: Yachna Sharma, Ph. D. Candidate in ECE
Date/Time : 2nd April, 1:00 pm

Title : Temporally Consistent Semantic Segmentation in Videos
S. Hussain Raza, Ph. D. Candidate in ECE
Date/Time : 2nd April, 1:00 pm

Location : CSIP Library, Room 5186, CenterGy One Building


Tags: , , , ,

AddThis Social Bookmark Button

Atlanta Magazine Features, Thad Starner, “Magnifying glass”

March 3rd, 2014 Irfan Essa Posted in In The News, Thad Starner, Ubiquitous Computing | No Comments »

A wonderful write up on my friend and colleague, Thad Starner in the Atlanta Magazine.  Worth a read for sure

“The guy with the computer on his face.” This would have been a fair description of Starner at almost any time over the past twenty years. He first built his own wearable computer with a head-mounted display in 1993, and has donned some version or another of the computer-eyepiece-Internet system most days since then. But over the previous year, something changed.

via Magnifying glass – Features – Atlanta Magazine.

Tags: , ,

AddThis Social Bookmark Button

NAE elects Prof. Alex (Sandy) Pentland as a Member

March 1st, 2014 Irfan Essa Posted in In The News, Sandy Pentland | No Comments »

Congratulations to my Ph. D. Advisor, Sandy Pentland for being elected to the National Academy of Engineering.

“For contributions to computer vision and technologies for measuring human social behavior.”

via NAE Website – Prof. Alex Pentland.

Tags: ,

AddThis Social Bookmark Button

Spring 2014 term begins; teaching CS 4464/6465 (Computational Journalism) and CS 4001 (Computerization and Society)

January 6th, 2014 Irfan Essa Posted in IROS/ICRA, ISWC, PAMI/ICCV/CVPR/ECCV | No Comments »

Welcome to Spring 2014 term.  Happy 2014 to all.  This term I am teaching CS 4464/6465 (Computational Journalism) and CS 4001 (Computerization and Society) at Georgia Tech.  Following links provide more information on both these classes.

  • CS 4464 / CS 6465 Computational Journalism: This class is aimed at understanding the computational and technological advancements in the area of journalism. Primary focus is on the study of technologies for developing new tools for (a) sense-making from diverse news information sources, (b) the impact of more and cheaper networked sensors (c) collaborative human models for information aggregation and sense-making, (d) mashups and the use of programming in journalism, (e) the impact of mobile computing and data gathering, (f) computational approaches to information quality, (g) data mining for personalization and aggregation, and (h) citizen journalism.
  • CS 4001 Computerization and Society: Although Computing, Society and Professionalism is a required course for CS majors, it is not a typical computer science course. Rather than dealing with the technical content of computing, it addresses the effects of computing on individuals, organizations, and society, and on what your responsibilities are as a computing professional in light of those impacts. The topic is a very broad one and one that you will have to deal with almost every day of your professional life. The issues are sometimes as intellectually deep as some of the greatest philosophical writings in history – and sometimes as shallow as a report on the evening TV news. This course can do little more than introduce you to the topics, but, if successful, will change the way you view the technology with which you work. You will do a lot of reading, analyzing, and communicating (verbally and in writing) in this course. It will require your active participation throughout the semester and should be fun and enlightening.

Tags: , ,

AddThis Social Bookmark Button

Paper in CVIU 2013 “A Visualization Framework for Team Sports Captured using Multiple Static Cameras”

October 3rd, 2013 Irfan Essa Posted in Activity Recognition, Computational Photography and Video, Jessica Hodgins, PAMI/ICCV/CVPR/ECCV, Papers, Raffay Hamid, Sports Visualization | No Comments »

  • R. Hamid, R. Kumar, J. Hodgins, and I. Essa (2013), “A Visualization Framework for Team Sports Captured using Multiple Static Cameras,” Computer Vision and Image Understanding, p. -, 2013. [PDF] [WEBSITE] [VIDEO] [DOI] [BIBTEX]
      Author = {Raffay Hamid and Ramkrishan Kumar and Jessica Hodgins and Irfan Essa},
      Date-Added = {2013-10-22 13:42:46 +0000},
      Date-Modified = {2014-04-28 17:09:21 +0000},
      Doi = {10.1016/j.cviu.2013.09.006},
      Issn = {1077-3142},
      Journal = {{Computer Vision and Image Understanding}},
      Number = {0},
      Pages = {-},
      Pdf = {},
      Title = {A Visualization Framework for Team Sports Captured using Multiple Static Cameras},
      Url = {},
      Video = {},
      Year = {2013},
      Bdsk-Url-1 = {},
      Bdsk-Url-2 = {},
      Bdsk-Url-3 = {}}


We present a novel approach for robust localization of multiple people observed using a set of static cameras. We use this location information to generate a visualization of the virtual offside line in soccer games. To compute the position of the offside line, we need to localize players′ positions, and identify their team roles. We solve the problem of fusing corresponding players′ positional information by finding minimum weight K-length cycles in a complete K-partite graph. Each partite of the graph corresponds to one of the K cameras, whereas each node of a partite encodes the position and appearance of a player observed from a particular camera. To find the minimum weight cycles in this graph, we use a dynamic programming based approach that varies over a continuum from maximally to minimally greedy in terms of the number of graph-paths explored at each iteration. We present proofs for the efficiency and performance bounds of our algorithms. Finally, we demonstrate the robustness of our framework by testing it on 82,000 frames of soccer footage captured over eight different illumination conditions, play types, and team attire. Our framework runs in near-real time, and processes video from 3 full HD cameras in about 0.4 seconds for each set of corresponding 3 frames.

via Science Direct A Visualization Framework for Team Sports Captured using Multiple Static Cameras.

Tags: , , , ,

AddThis Social Bookmark Button

Paper in ACM Ubicomp 2013 “Technological approaches for addressing privacy concerns when recognizing eating behaviors with wearable cameras”

September 14th, 2013 Irfan Essa Posted in Activity Recognition, Computational Photography and Video, Edison Thomaz, Gregory Abowd, ISWC, Mobile Computing, Papers, Ubiquitous Computing | No Comments »

  • E. Thomaz, A. Parnami, J. Bidwell, I. Essa, and G. D. Abowd (2013), “Technological Approaches for Addressing Privacy Concerns when Recognizing Eating Behaviors with Wearable Cameras.,” in Proceedings of the ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp, 2013. [PDF] [DOI] [BIBTEX]
      Author = {Edison Thomaz and Aman Parnami and Jonathan Bidwell and Irfan Essa and Gregory D. Abowd},
      Booktitle = {{Proceedings of the ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp}},
      Date-Added = {2013-10-22 18:31:23 +0000},
      Date-Modified = {2014-04-28 17:07:56 +0000},
      Doi = {10.1145/2493432.2493509},
      Month = {September},
      Pdf = {},
      Title = {Technological Approaches for Addressing Privacy Concerns when Recognizing Eating Behaviors with Wearable Cameras.},
      Year = {2013},
      Bdsk-Url-1 = {}}


First-person point-of-view (FPPOV) images taken by wearable cameras can be used to better understand people’s eating habits. Human computation is a way to provide effective analysis of FPPOV images in cases where algorithmic approaches currently fail. However, privacy is a serious concern. We provide a framework, the privacy-saliency matrix, for understanding the balance between the eating information in an image and its potential privacy concerns. Using data gathered by 5 participants wearing a lanyard-mounted smartphone, we show how the framework can be used to quantitatively assess the effectiveness of four automated techniques (face detection, image cropping, location filtering and motion filtering) at reducing the privacy-infringing content of images while still maintaining evidence of eating behaviors throughout the day.

via ACM DL Technological approaches for addressing privacy concerns when recognizing eating behaviors with wearable cameras.

Tags: , , , ,

AddThis Social Bookmark Button

Paper in ACM KDD 2013 “Detecting insider threats in a real corporate database of computer usage activity”

August 11th, 2013 Irfan Essa Posted in AAAI/IJCAI/UAI, Josh Jones, Vinay Bettadapura | No Comments »

  • T. E. Senator, H. G. Goldberg, A. Memory, W. T. Young, B. Rees, R. Pierce, D. Huang, M. Reardon, D. A. Bader, E. Chow, I. Essa, J. Jones, V. Bettadapura, D. H. Chau, O. Green, O. Kaya, A. Zakrzewska, E. Briscoe, R. I. L. Mappus, R. McColl, L. Weiss, T. G. Dietterich, A. Fern, W. Wong, S. Das, A. Emmott, J. Irvine, J. Lee, D. Koutra, C. Faloutsos, D. Corkill, L. Friedland, A. Gentzel, and D. Jensen (2013), “Detecting insider threats in a real corporate database of computer usage activity,” in Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, New York, NY, USA, 2013, pp. 1393-1401. [WEBSITE] [DOI] [BIBTEX]
      Acmid = {2488213},
      Address = {New York, NY, USA},
      Author = {Senator, Ted E. and Goldberg, Henry G. and Memory, Alex and Young, William T. and Rees, Brad and Pierce, Robert and Huang, Daniel and Reardon, Matthew and Bader, David A. and Chow, Edmond and Essa, Irfan and Jones, Joshua and Bettadapura, Vinay and Chau, Duen Horng and Green, Oded and Kaya, Oguz and Zakrzewska, Anita and Briscoe, Erica and Mappus, Rudolph IV L. and McColl, Robert and Weiss, Lora and Dietterich, Thomas G. and Fern, Alan and Wong, Weng--Keen and Das, Shubhomoy and Emmott, Andrew and Irvine, Jed and Lee, Jay-Yoon and Koutra, Danai and Faloutsos, Christos and Corkill, Daniel and Friedland, Lisa and Gentzel, Amanda and Jensen, David},
      Booktitle = {{Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining}},
      Date-Added = {2013-10-22 22:29:23 +0000},
      Date-Modified = {2014-05-16 20:10:57 +0000},
      Doi = {10.1145/2487575.2488213},
      Isbn = {978-1-4503-2174-7},
      Location = {Chicago, Illinois, USA},
      Month = {September},
      Numpages = {9},
      Pages = {1393--1401},
      Publisher = {ACM},
      Series = {KDD '13},
      Title = {Detecting insider threats in a real corporate database of computer usage activity},
      Url = {},
      Year = {2013},
      Bdsk-Url-1 = {},
      Bdsk-Url-2 = {}}


This paper reports on methods and results of an applied research project by a team consisting of SAIC and four universities to develop, integrate, and evaluate new approaches to detect the weak signals characteristic of insider threats on organizations’ information systems. Our system combines structural and semantic information from a real corporate database of monitored activity on their users’ computers to detect independently developed red team inserts of malicious insider activities. We have developed and applied multiple algorithms for anomaly detection based on suspected scenarios of malicious insider behavior, indicators of unusual activities, high-dimensional statistical patterns, temporal sequences, and normal graph evolution. Algorithms and representations for dynamic graph processing provide the ability to scale as needed for enterprise-level deployments on real-time data streams. We have also developed a visual language for specifying combinations of features, baselines, peer groups, time periods, and algorithms to detect anomalies suggestive of instances of insider threat behavior. We defined over 100 data features in seven categories based on approximately 5.5 million actions per day from approximately 5,500 users. We have achieved area under the ROC curve values of up to 0.979 and lift values of 65 on the top 50 user-days identified on two months of real data.

via ACM DL Detecting insider threats in a real corporate database of computer usage activity.

Tags: , ,

AddThis Social Bookmark Button

At ICVSS (International Computer Vision Summer School) 2013, in Calabria, ITALY (July 2013)

July 11th, 2013 Irfan Essa Posted in Computational Photography, Computational Photography and Video, Daniel Castro, Matthias Grundmann, Presentations, S. Hussain Raza, Vivek Kwatra | No Comments »

Teaching at the ICVSS 2013, in Calabria, Italy, July 2013 (Programme)

Computational Video: Post-processing Methods for Stabilization, Retargeting and Segmentation

Irfan Essa
(This work in collaboration with
Matthias Grundmann, Daniel Castro, Vivek Kwatra, Mei Han, S. Hussian Raza).


We address a variety of challenges for analysis and enhancement of Computational Video. We present novel post-processing methods to bridge the difference between professional and casually shot videos mostly seen on online sites. Our research presents solutions to three well-defined problems: (1) Video stabilization and rolling shutter removal in casually-shot, uncalibrated videos; (2) Content-aware video retargeting; and (3) spatio-temporal video segmentation to enable efficient video annotation. We showcase several real-world applications building on these techniques.

We start by proposing a novel algorithm for video stabilization that generates stabilized videos by employing L1-optimal camera paths to remove undesirable motions. We compute camera paths that are optimally partitioned into con- stant, linear and parabolic segments mimicking the camera motions employed by professional cinematographers. To achieve this, we propose a linear program- ming framework to minimize the first, second, and third derivatives of the result- ing camera path. Our method allows for video stabilization beyond conventional filtering, that only suppresses high frequency jitter. An additional challenge in videos shot from mobile phones are rolling shutter distortions. Modern CMOS cameras capture the frame one scanline at a time, which results in non-rigid image distortions such as shear and wobble. We propose a solution based on a novel mixture model of homographies parametrized by scanline blocks to correct these rolling shutter distortions. Our method does not rely on a-priori knowl- edge of the readout time nor requires prior camera calibration. Our novel video stabilization and calibration free rolling shutter removal have been deployed on YouTube where they have successfully stabilized millions of videos. We also discuss several extensions to the stabilization algorithm and present technical details behind the widely used YouTube Video Stabilizer.

We address the challenge of changing the aspect ratio of videos, by proposing algorithms that retarget videos to fit the form factor of a given device without stretching or letter-boxing. Our approaches use all of the screens pixels, while striving to deliver as much video-content of the original as possible. First, we introduce a new algorithm that uses discontinuous seam-carving in both space and time for resizing videos. Our algorithm relies on a novel appearance-based temporal coherence formulation that allows for frame-by-frame processing and results in temporally discontinuous seams, as opposed to geometrically smooth and continuous seams. Second, we present a technique, that builds on the above mentioned video stabilization approach. We effectively automate classical pan and scan techniques by smoothly guiding a virtual crop window via saliency constraints.

Finally, we introduce an efficient and scalable technique for spatio-temporal segmentation of long video sequences using a hierarchical graph-based algorithm. We begin by over-segmenting a volumetric video graph into space-time regions grouped by appearance. We then construct a region graph over the ob- tained segmentation and iteratively repeat this process over multiple levels to create a tree of spatio-temporal segmentations. This hierarchical approach gen- erates high quality segmentations, and allows subsequent applications to choose from varying levels of granularity. We demonstrate the use of spatio-temporal segmentation as users interact with the video, enabling efficient annotation of objects within the video.

Part of this talks will will expose attendees to use the Video Stabilizer on YouTube and the video segmentation system at Please find appropriate videos to test the systems.

Part of the work described above was done at Google, where Matthias Grundmann, Vivek Kwatra and Mei Han are, and Professor Essa is working as a Consultant. Part of the work were efforts of research by Matthias Grundmann, Daniel Castro and S. Hussain Raza, as part of their research efforts as students at GA Tech.

Tags: , , , , ,

AddThis Social Bookmark Button