Paper in IJCNN (2017) “Towards Using Visual Attributes to Infer Image Sentiment Of Social Events”

May 18th, 2017 Irfan Essa Posted in Computational Journalism, Computational Photography and Video, Computer Vision, Machine Learning, Papers, Unaiza Ahsan No Comments »

Paper

  • U. Ahsan, M. D. Choudhury, and I. Essa (2017), “Towards Using Visual Attributes to Infer Image Sentiment Of Social Events,” in Proceedings of The International Joint Conference on Neural Networks, Anchorage, Alaska, US, 2017. [PDF] [BIBTEX]
    @InProceedings{    2017-Ahsan-TUVAIISSE,
      address  = {Anchorage, Alaska, US},
      author  = {Unaiza Ahsan and Munmun De Choudhury and Irfan
          Essa},
      booktitle  = {Proceedings of The International Joint Conference
          on Neural Networks},
      month    = {May},
      pdf    = {http://www.cc.gatech.edu/~irfan/p/2017-Ahsan-TUVAIISSE.pdf},
      publisher  = {International Neural Network Society},
      title    = {Towards Using Visual Attributes to Infer Image
          Sentiment Of Social Events},
      year    = {2017}
    }

Abstract

Widespread and pervasive adoption of smartphones has led to instant sharing of photographs that capture events ranging from mundane to life-altering happenings. We propose to capture sentiment information of such social event images leveraging their visual content. Our method extracts an intermediate visual representation of social event images based on the visual attributes that occur in the images going beyond
sentiment-specific attributes. We map the top predicted attributes to sentiments and extract the dominant emotion associated with a picture of a social event. Unlike recent approaches, our method generalizes to a variety of social events and even to unseen events, which are not available at training time. We demonstrate the effectiveness of our approach on a challenging social event image dataset and our method outperforms state-of-the-art approaches for classifying complex event images into sentiments.

AddThis Social Bookmark Button

Presentation at the Machine Learning Center at GA Tech on “The New Machine Learning Center at GA Tech: Plans and Aspirations”

March 1st, 2017 Irfan Essa Posted in Machine Learning, Presentations No Comments »

Machine Learning at Georgia Tech Seminar Series

Speaker: Irfan Essa
Date/Time: March 1, 2017, 12n

Abstract

The Interdisciplinary Research Center (IRC) for Machine Learning at Georgia Tech (ML@GT) was established in Summer 2016 to foster research and academic activities in and around the discipline of Machine Learning. This center aims to create a community that leverages true cross-disciplinarity across all units on campus, establishes a home for the thought leaders in the area of Machine Learning, and creates programs to train the next generation of pioneers. In this talk, I will introduce the center, describe how we got here, attempt to outline the goals of this center and lay out it’s foundational, application, and educational thrusts. The primary purpose of this talk is to solicit feedback about these technical thrusts, which will be the areas we hope to focus on in the upcoming years. I will also describe, in brief, the new Ph.D. program that has been proposed and is pending approval. We will discuss upcoming events and plans for the future.

https://mediaspace.gatech.edu/media/essa/1_gfu6t21y

AddThis Social Bookmark Button

Announcing the new Interdisciplinary Research Center for Machine Learning at Georgia Tech (ML@GT)

October 6th, 2016 Irfan Essa Posted in In The News, Interesting, Machine Learning No Comments »

Announcement from Georgia Tech’s College of Computing about a new Interdisciplinary Research Center for Machine Learning (ML@GT) that I will be serving as the Inaugural Director for.ML@GT

Machine Learning @ Georgia Tech Based in the College of Computing, ML@GT represents all of Georgia Tech. It is tasked with pushing forward the ability for computers to learn from observations and data. As one of the fastest growing research areas in computing, machine learning spans many disciplines that use data to discover scientific principles, infer patterns, and extract meaningful knowledge.

According to School of Interactive Computing Professor Irfan Essa, inaugural director of ML@GT, machine learning (ML) has reached a new level of maturity and is now impacting all aspects of computing, engineering, science, and business. “We are in the era of aggregation, of collecting data,” said Essa. “However, machine learning is now propelling data analysis, and the whole concept of interpreting that data, toward a new era of making sense of the data, using it to make meaningful connections between information, and acting upon it in innovative ways that bring the most benefit to the most people.”

The new center begins with more than 100 affiliated faculty members from five Georgia Tech colleges and the Georgia Tech Research Institute, as well as some jointly affiliated with Emory University.

Source: Two New Interdisciplinary Research Centers Shaping Future of Computing | Georgia Tech – College of Computing

AddThis Social Bookmark Button

Presentation at Max-Planck-Institute for Intelligent Systems in Tübingen (2015): “Data-Driven Methods for Video Analysis and Enhancement”

September 10th, 2015 Irfan Essa Posted in Computational Photography and Video, Computer Vision, Machine Learning, Presentations No Comments »

Data-Driven Methods for Video Analysis and EnhancementIMG_3995

Irfan Essa (prof.irfanessa.com)
Georgia Institute of Technology

Thursday, September 10, 2 pm,
Max Planck House Lecture Hall (Spemannstr. 36)
Hosted by Max-Planck-Institute for Intelligent Systems (Michael Black, Director of Percieving Systems)

Abstract

In this talk, I will start with describing the pervasiveness of image and video content, and how such content is growing with the ubiquity of cameras.  I will use this to motivate the need for better tools for analysis and enhancement of video content. I will start with some of our earlier work on temporal modeling of video, then lead up to some of our current work and describe two main projects. (1) Our approach for a video stabilizer, currently implemented and running on YouTube and its extensions. (2) A robust and scalable method for video segmentation.

I will describe, in some detail, our Video stabilization method, which generates stabilized videos and is in wide use. Our method allows for video stabilization beyond the conventional filtering that only suppresses high-frequency jitter. This method also supports the removal of rolling shutter distortions common in modern CMOS cameras that capture the frame one scan-line at a time resulting in non-rigid image distortions such as shear and wobble. Our method does not rely on apriori knowledge and works on video from any camera or on legacy footage. I will showcase examples of this approach and also discuss how this method is launched and running on YouTube, with Millions of users.

Then I will  describe an efficient and scalable technique for spatiotemporal segmentation of long video sequences using a hierarchical graph-based algorithm. This hierarchical approach generates high-quality segmentations and we demonstrate the use of this segmentation as users interact with the video, enabling efficient annotation of objects within the video. I will also show some recent work on how this segmentation and annotation can be used to do dynamic scene understanding.

I will then follow up with some recent work on image and video analysis in the mobile domains.  I will also make some observations about the ubiquity of imaging and video in general and need for better tools for video analysis.

AddThis Social Bookmark Button

Paper in Ubicomp 2015: “A Practical Approach for Recognizing Eating Moments with Wrist-Mounted Inertial Sensing”

September 8th, 2015 Irfan Essa Posted in ACM UIST/CHI, Activity Recognition, Behavioral Imaging, Edison Thomaz, Gregory Abowd, Health Systems, Machine Learning, Mobile Computing, Papers, UBICOMP, Ubiquitous Computing No Comments »

Paper

  • E. Thomaz, I. Essa, and G. D. Abowd (2015), “A Practical Approach for Recognizing Eating Moments with Wrist-Mounted Inertial Sensing,” in Proceedings of ACM International Conference on Ubiquitous Computing (UBICOMP), 2015. [PDF] [BIBTEX]
    @InProceedings{    2015-Thomaz-PAREMWWIS,
      author  = {Edison Thomaz and Irfan Essa and Gregory D. Abowd},
      booktitle  = {Proceedings of ACM International Conference on
          Ubiquitous Computing (UBICOMP)},
      month    = {September},
      pdf    = {http://www.cc.gatech.edu/~irfan/p/2015-Thomaz-PAREMWWIS.pdf},
      title    = {A Practical Approach for Recognizing Eating Moments
          with Wrist-Mounted Inertial Sensing},
      year    = {2015}
    }

Abstract

Thomaz-UBICOMP15.pngRecognizing when eating activities take place is one of the key challenges in automated food intake monitoring. Despite progress over the years, most proposed approaches have been largely impractical for everyday usage, requiring multiple onbody sensors or specialized devices such as neck collars for swallow detection. In this paper, we describe the implementation and evaluation of an approach for inferring eating moments based on 3-axis accelerometry collected with a popular off-the-shelf smartwatch. Trained with data collected in a semi-controlled laboratory setting with 20 subjects, our system recognized eating moments in two free-living condition studies (7 participants, 1 day; 1 participant, 31 days), with Fscores of 76.1% (66.7% Precision, 88.8% Recall), and 71.3% (65.2% Precision, 78.6% Recall). This work represents a contribution towards the implementation of a practical, automated system for everyday food intake monitoring, with applicability in areas ranging from health research and food journaling.

AddThis Social Bookmark Button

Paper in ISWC 2015: “Predicting Daily Activities from Egocentric Images Using Deep Learning”

September 7th, 2015 Irfan Essa Posted in Activity Recognition, Daniel Castro, Gregory Abowd, Henrik Christensen, ISWC, Machine Learning, Papers, Steven Hickson, Ubiquitous Computing, Vinay Bettadapura No Comments »

Paper

  • D. Castro, S. Hickson, V. Bettadapura, E. Thomaz, G. Abowd, H. Christensen, and I. Essa (2015), “Predicting Daily Activities from Egocentric Images Using Deep Learning,” in Proceedings of International Symposium on Wearable Computers (ISWC), 2015. [PDF] [WEBSITE] [arXiv] [BIBTEX]
    @InProceedings{    2015-Castro-PDAFEIUDL,
      arxiv    = {http://arxiv.org/abs/1510.01576},
      author  = {Daniel Castro and Steven Hickson and Vinay
          Bettadapura and Edison Thomaz and Gregory Abowd and
          Henrik Christensen and Irfan Essa},
      booktitle  = {Proceedings of International Symposium on Wearable
          Computers (ISWC)},
      month    = {September},
      pdf    = {http://www.cc.gatech.edu/~irfan/p/2015-Castro-PDAFEIUDL.pdf},
      title    = {Predicting Daily Activities from Egocentric Images
          Using Deep Learning},
      url    = {http://www.cc.gatech.edu/cpl/projects/dailyactivities/},
      year    = {2015}
    }

Abstract

Castro-ISWC2015We present a method to analyze images taken from a passive egocentric wearable camera along with the contextual information, such as time and day of a week, to learn and predict everyday activities of an individual. We collected a dataset of 40,103 egocentric images over a 6 month period with 19 activity classes and demonstrate the benefit of state-of-the-art deep learning techniques for learning and predicting daily activities. Classification is conducted using a Convolutional Neural Network (CNN) with a classification method we introduce called a late fusion ensemble. This late fusion ensemble incorporates relevant contextual information and increases our classification accuracy. Our technique achieves an overall accuracy of 83.07% in predicting a person’s activity across the 19 activity classes. We also demonstrate some promising results from two additional users by fine-tuning the classifier with one day of training data.

AddThis Social Bookmark Button

Paper in ACM IUI15: “Inferring Meal Eating Activities in Real World Settings from Ambient Sounds: A Feasibility Study”

April 1st, 2015 Irfan Essa Posted in ACM ICMI/IUI, Activity Recognition, Audio Analysis, Behavioral Imaging, Edison Thomaz, Gregory Abowd, Health Systems, Machine Learning, Multimedia No Comments »

Paper

  • E. Thomaz, C. Zhang, I. Essa, and G. D. Abowd (2015), “Inferring Meal Eating Activities in Real World Settings from Ambient Sounds: A Feasibility Study,” in Proceedings of ACM Conference on Intelligence User Interfaces (IUI), 2015. (Best Short Paper Award) [PDF] [BIBTEX]
    @InProceedings{    2015-Thomaz-IMEARWSFASFS,
      author  = {Edison Thomaz and Cheng Zhang and Irfan Essa and
          Gregory D. Abowd},
      awards  = {(Best Short Paper Award)},
      booktitle  = {Proceedings of ACM Conference on Intelligence User
          Interfaces (IUI)},
      month    = {May},
      pdf    = {http://www.cc.gatech.edu/~irfan/p/2015-Thomaz-IMEARWSFASFS.pdf},
      title    = {Inferring Meal Eating Activities in Real World
          Settings from Ambient Sounds: A Feasibility Study},
      year    = {2015}
    }

Abstract

2015-04-IUI-AwardDietary self-monitoring has been shown to be an effective method for weight-loss, but it remains an onerous task despite recent advances in food journaling systems. Semi-automated food journaling can reduce the effort of logging, but often requires that eating activities be detected automatically. In this work we describe results from a feasibility study conducted in-the-wild where eating activities were inferred from ambient sounds captured with a wrist-mounted device; twenty participants wore the device during one day for an average of 5 hours while performing normal everyday activities. Our system was able to identify meal eating with an F-score of 79.8% in a person-dependent evaluation, and with 86.6% accuracy in a person-independent evaluation. Our approach is intended to be practical, leveraging off-the-shelf devices with audio sensing capabilities in contrast to systems for automated dietary assessment based on specialized sensors.

AddThis Social Bookmark Button

Paper in AISTATS 2013 “Beyond Sentiment: The Manifold of Human Emotions”

April 29th, 2013 Irfan Essa Posted in AAAI/IJCAI/UAI, Behavioral Imaging, Computational Journalism, Machine Learning, Papers, WWW No Comments »

  • S. Kim, F. Li, G. Lebanon, and I. A. Essa (2013), “Beyond Sentiment: The Manifold of Human Emotions,” in Proceedings of AI STATS, 2013. [PDF] [BIBTEX]
    @InProceedings{    2012-Kim-BSMHE,
      author  = {Seungyeon Kim and Fuxin Li and Guy Lebanon and
          Irfan A. Essa},
      booktitle  = {Proceedings of AI STATS},
      pdf    = {http://arxiv.org/pdf/1202.1568v1},
      title    = {Beyond Sentiment: The Manifold of Human Emotions},
      year    = {2013}
    }

Abstract

Sentiment analysis predicts the presence of positive or negative emotions in a text document. In this paper we consider higher dimensional extensions of the sentiment concept, which represent a richer set of human emotions. Our approach goes beyond previous work in that our model contains a continuous manifold rather than a finite set of human emotions. We investigate the resulting model, compare it to psychological observations, and explore its predictive capabilities. Besides obtaining significant improvements over a baseline without manifold, we are also able to visualize different notions of positive sentiment in different domains.

via [arXiv.org 1202.1568] Beyond Sentiment: The Manifold of Human Emotions.

AddThis Social Bookmark Button

Paper in IEEE CVPR 2012: “Detecting Regions of Interest in Dynamic Scenes with Camera Motions”

June 16th, 2012 Irfan Essa Posted in Activity Recognition, Kihwan Kim, Machine Learning, PAMI/ICCV/CVPR/ECCV, Papers, PERSEAS, Visual Surviellance No Comments »

Detecting Regions of Interest in Dynamic Scenes with Camera Motions

  • K. Kim, D. Lee, and I. Essa (2012), “Detecting Regions of Interest in Dynamic Scenes with Camera Motions,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012. [PDF] [WEBSITE] [VIDEO] [DOI] [BLOG] [BIBTEX]
    @InProceedings{    2012-Kim-DRIDSWCM,
      author  = {Kihwan Kim and Dongreyol Lee and Irfan Essa},
      blog    = {http://prof.irfanessa.com/2012/04/09/paper-cvpr2012/},
      booktitle  = {Proceedings of IEEE Conference on Computer Vision
          and Pattern Recognition (CVPR)},
      doi    = {10.1109/CVPR.2012.6247809},
      pdf    = {http://www.cc.gatech.edu/~irfan/p/2012-Kim-DRIDSWCM.pdf},
      publisher  = {IEEE Computer Society},
      title    = {Detecting Regions of Interest in Dynamic Scenes
          with Camera Motions},
      url    = {http://www.cc.gatech.edu/cpl/projects/roi/},
      video    = {http://www.youtube.com/watch?v=19BMwDMCSp8},
      year    = {2012}
    }

Abstract

We present a method to detect the regions of interests in moving camera views of dynamic scenes with multiple mov- ing objects. We start by extracting a global motion tendency that reflects the scene context by tracking movements of objects in the scene. We then use Gaussian process regression to represent the extracted motion tendency as a stochastic vector field. The generated stochastic field is robust to noise and can handle a video from an uncalibrated moving camera. We use the stochastic field for predicting important future regions of interest as the scene evolves dynamically.

We evaluate our approach on a variety of videos of team sports and compare the detected regions of interest to the camera motion generated by actual camera operators. Our experimental results demonstrate that our approach is computationally efficient, and provides better prediction than those of previously proposed RBF-based approaches.

Presented at: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2012, Providence, RI, June 16-21, 2012

AddThis Social Bookmark Button

Paper (2011) in IEEE PAMI: “Bilayer Segmentation of Webcam Videos Using Tree-Based Classifiers “

January 12th, 2011 Irfan Essa Posted in Antonio Crimisini, Computational Photography and Video, John Winn, Machine Learning, PAMI/ICCV/CVPR/ECCV, Papers, Pei Yin No Comments »

Bilayer Segmentation of Webcam Videos Using Tree-Based Classifiers

Pei Yin, A. Criminisi, J. Winn, I. Essa (2011), “Bilayer Segmentation of Webcam Videos Using Tree-Based Classifiers” in Pattern Analysis and Machine Intelligence, IEEE Transactions on, Jan. 2011, Volume :  33 ,  Issue:1, ISSN :  0162-8828, Digital Object Identifier :  10.1109/TPAMI.2010.65,  IEEE Computer Society [Project Page|DOI]

ABSTRACT

This paper presents an automatic segmentation algorithm for video frames captured by a (monocular) webcam that closely approximates depth segmentation from a stereo camera. The frames are segmented into foreground and background layers that comprise a subject (participant) and other objects and individuals. The algorithm produces correct segmentations even in the presence of large background motion with a nearly stationary foreground. This research makes three key contributions: First, we introduce a novel motion representation, referred to as “motons,” inspired by research in object recognition. Second, we propose estimating the segmentation likelihood from the spatial context of motion. The estimation is efficiently learned by random forests. Third, we introduce a general taxonomy of tree-based classifiers that facilitates both theoretical and experimental comparisons of several known classification algorithms and generates new ones. In our bilayer segmentation algorithm, diverse visual cues such as motion, motion context, color, contrast, and spatial priors are fused by means of a conditional random field (CRF) model. Segmentation is then achieved by binary min-cut. Experiments on many sequences of our videochat application demonstrate that our algorithm, which requires no initialization, is effective in a variety of scenes, and the segmentation results are comparable to those obtained by stereo systems.

via IEEE Xplore – Abstract Page.

AddThis Social Bookmark Button