Welcome to my website (prof.irfanessa.com). Here you will find information related to my academic pursuits. This includes updates on my research projects, list of publications, classes I teach and my collaborators/students. If you'd like to contact me, I suggest please see the FAQ. Students wanted to contact me about working with me are highly encouraged to read the FAQ. My bio is also available. Use the menu bar above, or the TAGS and CATEGORIES listed in the columns to find relevant information.
J. Deeb-Swihart, C. Polack, E. Gilbert, and I. Essa (2017), “Selfie-Presentation in Everyday Life: A Large-Scale Characterization of Selfie Contexts on Instagram,” in In Proceedings of The International AAAI Conference on Web and Social Media (ICWSM), 2017. [PDF][BIBTEX]
@InProceedings{ 2017-Deeb-Swihart-SELLCSCI,
author = {Julia Deeb-Swihart and Christopher Polack and Eric
Gilbert and Irfan Essa},
booktitle = {In Proceedings of The International AAAI Conference
on Web and Social Media (ICWSM)},
month = {May},
organization = {AAAI},
pdf = {http://www.cc.gatech.edu/~irfan/p/2017-Deeb-Swihart-SELLCSCI.pdf},
title = {Selfie-Presentation in Everyday Life: A Large-Scale
Characterization of Selfie Contexts on Instagram},
year = {2017}
}
Abstract
Carefully managing the presentation of self via technology is a core practice on all modern social media platforms. Recently, selfies have emerged as a new, pervasive genre of identity performance. In many ways unique, selfies bring us full circle to Goffman—blending the online and offline selves together. In this paper, we take an empirical, Goffman-inspired look at the phenomenon of selfies. We report a large-scale, mixed-method analysis of the categories in which selfies appear on Instagram—an online community comprising over 400M people. Applying computer vision and network analysis techniques to 2.5M selfies, we present a typology of emergent selfie categories which represent emphasized identity statements. To the best of our knowledge, this is the first large-scale, empirical research on selfies. We conclude, contrary to common portrayals in the press, that selfies are really quite ordinary: they project identity signals such as wealth, health and physical attractiveness common to many online media, and to offline life.
U. Ahsan, M. D. Choudhury, and I. Essa (2017), “Towards Using Visual Attributes to Infer Image Sentiment Of Social Events,” in Proceedings of The International Joint Conference on Neural Networks, Anchorage, Alaska, US, 2017. [PDF][BIBTEX]
@InProceedings{ 2017-Ahsan-TUVAIISSE,
address = {Anchorage, Alaska, US},
author = {Unaiza Ahsan and Munmun De Choudhury and Irfan
Essa},
booktitle = {Proceedings of The International Joint Conference
on Neural Networks},
month = {May},
pdf = {http://www.cc.gatech.edu/~irfan/p/2017-Ahsan-TUVAIISSE.pdf},
publisher = {International Neural Network Society},
title = {Towards Using Visual Attributes to Infer Image
Sentiment Of Social Events},
year = {2017}
}
Abstract
Widespread and pervasive adoption of smartphones has led to instant sharing of photographs that capture events ranging from mundane to life-altering happenings. We propose to capture sentiment information of such social event images leveraging their visual content. Our method extracts an intermediate visual representation of social event images based on the visual attributes that occur in the images going beyond
sentiment-specific attributes. We map the top predicted attributes to sentiments and extract the dominant emotion associated with a picture of a social event. Unlike recent approaches, our method generalizes to a variety of social events and even to unseen events, which are not available at training time. We demonstrate the effectiveness of our approach on a challenging social event image dataset and our method outperforms state-of-the-art approaches for classifying complex event images into sentiments.
U. Ahsan, C. Sun, J. Hays, and I. Essa (2017), “Complex Event Recognition from Images with Few Training Examples,” in IEEE Winter Conference on Applications of Computer Vision (WACV), 2017. [PDF] [arXiv][BIBTEX]
@InProceedings{ 2017-Ahsan-CERFIWTE,
arxiv = {https://arxiv.org/abs/1701.04769},
author = {Unaiza Ahsan and Chen Sun and James Hays and Irfan
Essa},
booktitle = {IEEE Winter Conference on Applications of Computer
Vision (WACV)},
month = {March},
pdf = {http://www.cc.gatech.edu/~irfan/p/2017-Ahsan-CERFIWTE.pdf},
title = {Complex Event Recognition from Images with Few
Training Examples},
year = {2017}
}
Abstract
We propose to leverage concept-level representations for complex event recognition in photographs given limited training examples. We introduce a novel framework to discover event concept attributes from the web and use that to extract semantic features from images and classify them into social event categories with few training examples. Discovered concepts include a variety of objects, scenes, actions and event subtypes, leading to a discriminative and compact representation for event images. Web images are obtained for each discovered event concept and we use (pre-trained) CNN features to train concept classifiers. Extensive experiments on challenging event datasets demonstrate that our proposed method outperforms several baselines using deep CNN features directly in classifying images into events with limited training examples. We also demonstrate that our method achieves the best overall accuracy on a data set with unseen event categories using a single training example.
V. Bettadapura, C. Pantofaru, and I. Essa (2016), “Leveraging Contextual Cues for Generating Basketball Highlights,” in Proceedings of ACM International Conference on Multimedia (ACM-MM), 2016. [PDF][WEBSITE] [arXiv][BIBTEX]
@InProceedings{ 2016-Bettadapura-LCCGBH,
arxiv = {http://arxiv.org/abs/1606.08955},
author = {Vinay Bettadapura and Caroline Pantofaru and Irfan
Essa},
booktitle = {Proceedings of ACM International Conference on
Multimedia (ACM-MM)},
month = {October},
organization = {ACM},
pdf = {http://www.cc.gatech.edu/~irfan/p/2016-Bettadapura-LCCGBH.pdf},
title = {Leveraging Contextual Cues for Generating
Basketball Highlights},
url = {http://www.vbettadapura.com/highlights/basketball/index.htm},
year = {2016}
}
Abstract
Leveraging Contextual Cues for Generating Basketball Highlights
The massive growth of sports videos has resulted in a need for automatic generation of sports highlights that are comparable in quality to the hand-edited highlights produced by broadcasters such as ESPN. Unlike previous works that mostly use audio-visual cues derived from the video, we propose an approach that additionally leverages contextual cues derived from the environment that the game is being played in. The contextual cues provide information about the excitement levels in the game, which can be ranked and selected to automatically produce high-quality basketball highlights. We introduce a new dataset of 25 NCAA games along with their play-by-play stats and the ground-truth excitement data for each basket. We explore the informativeness of five different cues derived from the video and from the environment through user studies. Our experiments show that for our study participants, the highlights produced by our system are comparable to the ones produced by ESPN for the same games.
A. Zia, Y. Sharma, V. Bettadapura, E. Sarin, M. Clements, and I. Essa (2015), “Automated Assessment of Surgical Skills Using Frequency Analysis,” in International Conference on Medical Image Computing and Computer Assisted Interventions (MICCAI), 2015. [PDF][BIBTEX]
@InProceedings{ 2015-Zia-AASSUFA,
author = {A. Zia and Y. Sharma and V. Bettadapura and E.
Sarin and M. Clements and I. Essa},
booktitle = {International Conference on Medical Image Computing
and Computer Assisted Interventions (MICCAI)},
month = {October},
pdf = {http://www.cc.gatech.edu/~irfan/p/2015-Zia-AASSUFA.pdf},
title = {Automated Assessment of Surgical Skills Using
Frequency Analysis},
year = {2015}
}
Abstract
We present an automated framework for a visual assessment of the expertise level of surgeons using the OSATS (Objective Structured Assessment of Technical Skills) criteria. Video analysis technique for extracting motion quality via frequency coefficients is introduced. The framework is tested in a case study that involved analysis of videos of medical students with different expertise levels performing basic surgical tasks in a surgical training lab setting. We demonstrate that transforming the sequential time data into frequency components effectively extracts the useful information differentiating between different skill levels of the surgeons. The results show significant performance improvements using DFT and DCT coefficients over known state-of-the-art techniques.
E. Thomaz, I. Essa, and G. D. Abowd (2015), “A Practical Approach for Recognizing Eating Moments with Wrist-Mounted Inertial Sensing,” in Proceedings of ACM International Conference on Ubiquitous Computing (UBICOMP), 2015. [PDF][BIBTEX]
@InProceedings{ 2015-Thomaz-PAREMWWIS,
author = {Edison Thomaz and Irfan Essa and Gregory D. Abowd},
booktitle = {Proceedings of ACM International Conference on
Ubiquitous Computing (UBICOMP)},
month = {September},
pdf = {http://www.cc.gatech.edu/~irfan/p/2015-Thomaz-PAREMWWIS.pdf},
title = {A Practical Approach for Recognizing Eating Moments
with Wrist-Mounted Inertial Sensing},
year = {2015}
}
Abstract
Recognizing when eating activities take place is one of the key challenges in automated food intake monitoring. Despite progress over the years, most proposed approaches have been largely impractical for everyday usage, requiring multiple onbody sensors or specialized devices such as neck collars for swallow detection. In this paper, we describe the implementation and evaluation of an approach for inferring eating moments based on 3-axis accelerometry collected with a popular off-the-shelf smartwatch. Trained with data collected in a semi-controlled laboratory setting with 20 subjects, our system recognized eating moments in two free-living condition studies (7 participants, 1 day; 1 participant, 31 days), with Fscores of 76.1% (66.7% Precision, 88.8% Recall), and 71.3% (65.2% Precision, 78.6% Recall). This work represents a contribution towards the implementation of a practical, automated system for everyday food intake monitoring, with applicability in areas ranging from health research and food journaling.
D. Castro, S. Hickson, V. Bettadapura, E. Thomaz, G. Abowd, H. Christensen, and I. Essa (2015), “Predicting Daily Activities from Egocentric Images Using Deep Learning,” in Proceedings of International Symposium on Wearable Computers (ISWC), 2015. [PDF][WEBSITE] [arXiv][BIBTEX]
@InProceedings{ 2015-Castro-PDAFEIUDL,
arxiv = {http://arxiv.org/abs/1510.01576},
author = {Daniel Castro and Steven Hickson and Vinay
Bettadapura and Edison Thomaz and Gregory Abowd and
Henrik Christensen and Irfan Essa},
booktitle = {Proceedings of International Symposium on Wearable
Computers (ISWC)},
month = {September},
pdf = {http://www.cc.gatech.edu/~irfan/p/2015-Castro-PDAFEIUDL.pdf},
title = {Predicting Daily Activities from Egocentric Images
Using Deep Learning},
url = {http://www.cc.gatech.edu/cpl/projects/dailyactivities/},
year = {2015}
}
Abstract
We present a method to analyze images taken from a passive egocentric wearable camera along with the contextual information, such as time and day of a week, to learn and predict everyday activities of an individual. We collected a dataset of 40,103 egocentric images over a 6 month period with 19 activity classes and demonstrate the benefit of state-of-the-art deep learning techniques for learning and predicting daily activities. Classification is conducted using a Convolutional Neural Network (CNN) with a classification method we introduce called a late fusion ensemble. This late fusion ensemble incorporates relevant contextual information and increases our classification accuracy. Our technique achieves an overall accuracy of 83.07% in predicting a person’s activity across the 19 activity classes. We also demonstrate some promising results from two additional users by fine-tuning the classifier with one day of training data.
S. Hickson, I. Essa, and H. Christensen (2015), “Semantic Instance Labeling Leveraging Hierarchical Segmentation,” in Proceedings of IEEE Winter Conference on Applications of Computer Vision (WACV), 2015. [PDF] [DOI][BIBTEX]
@InProceedings{ 2015-Hickson-SILLHS,
author = {Steven Hickson and Irfan Essa and Henrik
Christensen},
booktitle = {Proceedings of IEEE Winter Conference on
Applications of Computer Vision (WACV)},
doi = {10.1109/WACV.2015.147},
month = {January},
pdf = {http://www.cc.gatech.edu/~irfan/p/2015-Hickson-SILLHS.pdf},
publisher = {IEEE Computer Society},
title = {Semantic Instance Labeling Leveraging Hierarchical
Segmentation},
year = {2015}
}
Abstract
Most of the approaches for indoor RGBD semantic labeling focus on using pixels or superpixels to train a classifier. In this paper, we implement a higher level segmentation using a hierarchy of superpixels to obtain a better segmentation for training our classifier. By focusing on meaningful segments that conform more directly to objects, regardless of size, we train a random forest of decision trees as a classifier using simple features such as the 3D size, LAB color histogram, width, height, and shape as specified by a histogram of surface normals. We test our method on the NYU V2 depth dataset, a challenging dataset of cluttered indoor environments. Our experiments using the NYU V2 depth dataset show that our method achieves state of the art results on both a general semantic labeling introduced by the dataset (floor, structure, furniture, and objects) and a more object specific semantic labeling. We show that training a classifier on a segmentation from a hierarchy of super pixels yields better results than training directly on super pixels, patches, or pixels as in previous work.
S. H. Raza, A. Humayun, M. Grundmann, D. Anderson, and I. Essa (2015), “Finding Temporally Consistent Occlusion Boundaries using Scene Layout,” in Proceedings of IEEE Winter Conference on Applications of Computer Vision (WACV), 2015. [PDF] [DOI][BIBTEX]
@InProceedings{ 2015-Raza-FTCOBUSL,
author = {Syed Hussain Raza and Ahmad Humayun and Matthias
Grundmann and David Anderson and Irfan Essa},
booktitle = {Proceedings of IEEE Winter Conference on
Applications of Computer Vision (WACV)},
doi = {10.1109/WACV.2015.141},
month = {January},
pdf = {http://www.cc.gatech.edu/~irfan/p/2015-Raza-FTCOBUSL.pdf},
publisher = {IEEE Computer Society},
title = {Finding Temporally Consistent Occlusion Boundaries
using Scene Layout},
year = {2015}
}
Abstract
We present an algorithm for finding temporally consistent occlusion boundaries in videos to support segmentation of dynamic scenes. We learn occlusion boundaries in a pairwise Markov random field (MRF) framework. We first estimate the probability of a spatiotemporal edge being an occlusion boundary by using appearance, flow, and geometric features. Next, we enforce occlusion boundary continuity in an MRF model by learning pairwise occlusion probabilities using a random forest. Then, we temporally smooth boundaries to remove temporal inconsistencies in occlusion boundary estimation. Our proposed framework provides an efficient approach for finding temporally consistent occlusion boundaries in video by utilizing causality, redundancy in videos, and semantic layout of the scene. We have developed a dataset with fully annotated ground-truth occlusion boundaries of over 30 videos (∼5000 frames). This dataset is used to evaluate temporal occlusion boundaries and provides a much-needed baseline for future studies. We perform experiments to demonstrate the role of scene layout, and temporal information for occlusion reasoning in video of dynamic scenes.
V. Bettadapura, E. Thomaz, A. Parnami, G. Abowd, and I. Essa (2015), “Leveraging Context to Support Automated Food Recognition in Restaurants,” in Proceedings of IEEE Winter Conference on Applications of Computer Vision (WACV), 2015. [PDF][WEBSITE] [DOI] [arXiv][BIBTEX]
@InProceedings{ 2015-Bettadapura-LCSAFRR,
arxiv = {http://arxiv.org/abs/1510.02078},
author = {Vinay Bettadapura and Edison Thomaz and Aman
Parnami and Gregory Abowd and Irfan Essa},
booktitle = {Proceedings of IEEE Winter Conference on
Applications of Computer Vision (WACV)},
doi = {10.1109/WACV.2015.83},
month = {January},
pdf = {http://www.cc.gatech.edu/~irfan/p/2015-Bettadapura-LCSAFRR.pdf},
publisher = {IEEE Computer Society},
title = {Leveraging Context to Support Automated Food
Recognition in Restaurants},
url = {http://www.vbettadapura.com/egocentric/food/},
year = {2015}
}
Abstract
The pervasiveness of mobile cameras has resulted in a dramatic increase in food photos, which are pictures reflecting what people eat. In this paper, we study how taking pictures of what we eat in restaurants can be used for the purpose of automating food journaling. We propose to leverage the context of where the picture was taken, with additional information about the restaurant, available online, coupled with state-of-the-art computer vision techniques to recognize the food being consumed. To this end, we demonstrate image-based recognition of foods eaten in restaurants by training a classifier with images from restaurant’s online menu databases. We evaluate the performance of our system in unconstrained, real-world settings with food images taken in 10 restaurants across 5 different types of food (American, Indian, Italian, Mexican and Thai).