Thesis Raffay Hamid PhD (2008): “A Computational Framework For Unsupervised Analysis of Everyday Human Activities”

June 18th, 2008 Irfan Essa Posted in Aaron Bobick, Activity Recognition, Machine Learning, PhD, Raffay Hamid No Comments »

M. Raffay Hamid PhD (2008), “A Computational Framework For Unsupervised Analysis of Everyday Human Activities“, PhD Thesis, Georgia Institute of Techniology, College of Computing, Atlanta, GA. (Advisor: Aaron Bobick & Irfan Essa)

Abstract

In order to make computers proactive and assistive, we must enable them to perceive, learn, and predict what is happening in their surroundings. This presents us with the challenge of formalizing computational models of everyday human activities. For a majority of environments, the structure of the in situ activities is generally not known a priori. This thesis therefore investigates knowledge representations and manipulation techniques that can facilitate learning of such everyday human activities in a minimally supervised manner. 

A key step towards this end is finding appropriate representations for human activities. We posit that if we chose to describe activities as finite sequences of an appropriate set of events, then the global structure of these activities can be uniquely encoded using their local event sub-sequences. With this perspective at hand, we particularly investigate representations that characterize activities in terms of their fixed and variable length event subsequences. We comparatively analyze these representations in terms of their representational scope, feature cardinality and noise sensitivity.

Exploiting such representations, we propose a computational framework to discover the various activity-classes taking place in an environment. We model these activity-classes as maximally similar activity-cliques in a completely connected graph of activities, and describe how to discover them efficiently. Moreover, we propose methods for finding concise characterizations of these discovered activity-classes, both from a holistic as well as a by-parts perspective. Using such characterizations, we present an incremental method to classify

a new activity instance to one of the discovered activity-classes, and to automatically detect if it is anomalous with respect to the general characteristics of its membership class. Our results show the efficacy of our framework in a variety of everyday environments

AddThis Social Bookmark Button

Paper: ICASSP (2008) “Discriminative Feature Selection for Hidden Markov Models using Segmental Boosting”

April 3rd, 2008 Irfan Essa Posted in 0205507, Face and Gesture, Funding, James Rehg, Machine Learning, PAMI/ICCV/CVPR/ECCV, Papers, Pei Yin, Thad Starner No Comments »

Pei Yin, Irfan Essa, James Rehg, Thad Starner (2008) “Discriminative Feature Selection for Hidden Markov Models using Segmental Boosting”, ICASSP 2008 – March 30 – April 4, 2008 – Las Vegas, Nevada, U.S.A. (Paper: MLSP-P3.D8, Session: Pattern Recognition and Classification II, Time: Thursday, April 3, 15:30 – 17:30, Topic: Machine Learning for Signal Processing: Learning Theory and Modeling) (PDF|Project Site)

ABSTRACT

icassp08We address the feature selection problem for hidden Markov models (HMMs) in sequence classification. Temporal correlation in sequences often causes difficulty in applying feature selection techniques. Inspired by segmental k-means segmentation (SKS), we propose Segmentally Boosted HMMs (SBHMMs), where the state-optimized features are constructed in a segmental and discriminative manner. The contributions are twofold. First, we introduce a novel feature selection algorithm, where the temporal dynamics are decoupled from the static learning procedure by assuming that the sequential data are piecewise independent and identically distributed. Second, we show that the SBHMM consistently improves traditional HMM recognition in various domains. The reduction of error compared to traditional HMMs ranges from 17% to 70% in American Sign Language recognition, human gait identification, lip reading, and speech recognition.

AddThis Social Bookmark Button

Funding: NSF/SGER (2007) “Persistent, Adaptive, Collaborative Synthespians”

September 15th, 2007 Irfan Essa Posted in Charles Isbell, Machine Learning No Comments »

Award#0749181 – SGER Collaborative Research: Persistent, Adaptive, Collaborative Synthespians
ABSTRACT

This project explores the development of methodologies for populating worlds with persistent, adaptive, collaborative, believable synthetic actors, referred to as Synthespians. These methods are extensions of adaptive models of learning and planning to accommodate the complex, dynamic environments in massive multi-player online games. The intellectual merit includes the development and evaluation of: 1. A behavior development language, with discovery, machine learning, and adaptation of behaviors directly integrated into the language, allowing for the rapid development and deployment of Synthespians. 2. A framework for the actors to recognize and discover plans by observing and modeling the activities of the other agents. An expected outcome of this research is the ability to author complex virtual worlds with many participants that support intelligent and effective interaction between people and machines. Broader Impact: A scientific understanding of how we interact with each other and collaborate will benefit from our ability to simulate complex environments with dynamic and evolving individual and group behaviors. In this project, building and modeling such environments and behaviors is done within a gaming context. This work will in the long run effect and change the fields of education and entertainment. In addition, being able to model large collaborative and interactive scenarios will also help us understand and model large social dynamics phenomenon of interest to sociologists and economists.

AddThis Social Bookmark Button

Paper: IEEE CVPR (2007) “Tree-based Classifiers for Bilayer Video Segmentation”

June 17th, 2007 Irfan Essa Posted in 0205507, Antonio Crimisini, Computational Photography and Video, Funding, John Winn, Machine Learning, Papers, Pei Yin, Research No Comments »

Yin, Pei Criminisi, Antonio Winn, John Essa, Irfan (2007), Tree-based Classifiers for Bilayer Video Segmentation In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2007. CVPR ’07, 17-22 June 2007, page(s): 1 – 8, Location: Minneapolis, MN, USA, ISBN: 1-4244-1180-7, Digital Object Identifier: 10.1109/CVPR.2007.383008

Abstract

This paper presents an algorithm for the automatic segmentation of monocular videos into foreground and background layers. Correct segmentations are produced even in the presence of large background motion with nearly stationary foreground. There are three key contributions. The first is the introduction of a novel motion representation, “motons”, inspired by research in object recognition. Second, we propose learning the segmentation likelihood from the spatial context of motion. The learning is efficiently performed by Random Forests. The third contribution is a general taxonomy of tree-based classifiers, which facilitates theoretical and experimental comparisons of several known classification algorithms, as well as spawning new ones. Diverse visual cues such as motion, motion context, colour, contrast and spatial priors are fused together by means of a Conditional Random Field (CRF) model. Segmentation is then achieved by binary min-cut. Our algorithm requires no initialization. Experiments on many video-chat type sequences demonstrate the effectiveness of our algorithm in a variety of scenes. The segmentation results are comparable to those obtained by stereo systems.

AddThis Social Bookmark Button

Paper: Asilomar Conference (2003) “Boosted audio-visual HMM for speech reading”

November 9th, 2003 Irfan Essa Posted in 0205507, Face and Gesture, Funding, James Rehg, Machine Learning, Papers, Pei Yin No Comments »

Yin, P. Essa, I. Rehg, J.M. (2003) “Boosted audio-visual HMM for speech reading.” In Proceedings Thirty-Seventh Asilomar Conference on Signals, Systems and Computers, 2003. Date: 9-12 Nov. 2003, Volume: 2, On page(s): 2013 – 2018 Vol.2, , ISBN: 0-7803-8104-1, INSPEC Accession Number:8555396, Digital Object Identifier: 10.1109/ACSSC.2003.1292334

Abstract

We propose a new approach for combining acoustic and visual measurements to aid in recognizing lip shapes of a person speaking. Our method relies on computing the maximum likelihoods of (a) HMM used to model phonemes from the acoustic signal, and (b) HMM used to model visual features motions from video. One significant addition in this work is the dynamic analysis with features selected by AdaBoost, on the basis of their discriminant ability. This form of integration, leading to boosted HMM, permits AdaBoost to find the best features first, and then uses HMM to exploit dynamic information inherent in the signal.

AddThis Social Bookmark Button

Paper:ICPR (2002) “Learning video processing by example”

August 11th, 2002 Irfan Essa Posted in Antonio Haro, Collaborators, Computational Photography and Video, Machine Learning, PAMI/ICCV/CVPR/ECCV No Comments »

Haro, A. Essa, I. (2002), “Learning video processing by example” In Proceedings of 16th International Conference on Pattern Recognition, 2002, 11-15 Aug. 2002 Volume: 1, page(s): 487 – 491 vol.1, Number of Pages: 4 vol.(xxix 834 xxxv 1116 xxxiii 1068 xxv 418), ISSN: 1051-4651, ISBN: 0-7695-1695-X, [Digital Object Identifier: 10.1109/ICPR.2002.1044771][IEEEXplore#]

Abstract

We present an algorithm that approximates the output of an arbitrary video processing algorithm based on a pair of input and output exemplars. Our algorithm relies on learning the mapping between the input and output exemplars to model the processing that has taken place. We approximate the processing by observing that pixel neighborhoods similar in appearance and motion to those in the exemplar input should result in neighborhoods similar to the exemplar output. Since there are not many pixel neighborhoods in the exemplars, we use techniques from texture synthesis to generalize the output of neighborhoods not observed in the exemplars. The same algorithm is used to learn such processing as motion blur color correction, and painting.

AddThis Social Bookmark Button