MENU: Home Bio Affiliations Research Teaching Publications Videos Collaborators/Students Contact FAQ ©2007-14 RSS

Paper (2009): ICASSP “Learning Basic Units in American Sign Language using Discriminative Segmental Feature Selection”

February 4th, 2009 Irfan Essa Posted in 0205507, Face and Gesture, ICASSP, James Rehg, Numerical Machine Learning, Pei Yin, Thad Starner No Comments »

Pei Yin, Thad Starner, Harley Hamilton, Irfan Essa, James M. Rehg (2009), ”Learning Basic Units in American Sign Language using Discriminative Segmental Feature Selection” in IEEE Conference on Acoustics, Speech, and Signal Processing 2009 (ICASSP 2009). Session: Spoken Language Understanding I, Tuesday, April 21, 11:00 – 13:00, Taipei, Taiwan.

ABSTRACT

The natural language for most deaf signers in the United States is American Sign Language (ASL). ASL has internal structure like spoken languages, and ASL linguists have introduced several phonemic models. The study of ASL phonemes is not only interesting to linguists, but also useful for scalability in recognition by machines. Since machine perception is different than human perception, this paper learns the basic units for ASL directly from data. Comparing with previous studies, our approach computes a set of data-driven units (fenemes) discriminatively from the results of segmental feature selection. The learning iterates the following two steps: first apply discriminative feature selection segmentally to the signs, and then tie the most similar temporal segments to re-train. Intuitively, the sign parts indistinguishable to machines are merged to form basic units, which we call ASL fenemes. Experiments on publicly available ASL recognition data show that the extracted data-driven fenemes are meaningful, and recognition using those fenemes achieves improved accuracy at reduced model complexity

AddThis Social Bookmark Button

Paper: ICPR (2008) “3D Shape Context and Distance Transform for Action Recognition”

December 8th, 2008 Irfan Essa Posted in Activity Recognition, Aware Home, Face and Gesture, Franzi Meier, Matthias Grundmann, PAMI/ICCV/CVPR/ECCV, Papers 1 Comment »

M. Grundmann, F. Meier, and I. Essa (2008) “3D Shape Context and Distance Transform for Action Recognition”, In Proceedings of International Conference on Pattern Recognition (ICPR) 2008, Tampa, FL. [Project Page | DOI | PDF]

ABSTRACT

We propose the use of 3D (2D+time) Shape Context to recognize the spatial and temporal details inherent in human actions. We represent an action in a video sequence by a 3D point cloud extracted by sampling 2D silhouettes over time. A non-uniform sampling method is introduced that gives preference to fast moving body parts using a Euclidean 3D Distance Transform. Actions are then classified by matching the extracted point clouds. Our proposed approach is based on a global matching and does not require specific training to learn the model. We test the approach thoroughly on two publicly available datasets and compare to several state-of-the-art methods. The achieved classification accuracy is on par with or superior to the best results reported to date.

AddThis Social Bookmark Button

Paper: ICASSP (2008) “Discriminative Feature Selection for Hidden Markov Models using Segmental Boosting”

April 3rd, 2008 Irfan Essa Posted in 0205507, Face and Gesture, Funding, James Rehg, Numerical Machine Learning, PAMI/ICCV/CVPR/ECCV, Papers, Pei Yin, Thad Starner No Comments »

Pei Yin, Irfan Essa, James Rehg, Thad Starner (2008) “Discriminative Feature Selection for Hidden Markov Models using Segmental Boosting”, ICASSP 2008 – March 30 – April 4, 2008 – Las Vegas, Nevada, U.S.A. (Paper: MLSP-P3.D8, Session: Pattern Recognition and Classification II, Time: Thursday, April 3, 15:30 – 17:30, Topic: Machine Learning for Signal Processing: Learning Theory and Modeling) (PDF|Project Site)

ABSTRACT

icassp08We address the feature selection problem for hidden Markov models (HMMs) in sequence classification. Temporal correlation in sequences often causes difficulty in applying feature selection techniques. Inspired by segmental k-means segmentation (SKS), we propose Segmentally Boosted HMMs (SBHMMs), where the state-optimized features are constructed in a segmental and discriminative manner. The contributions are twofold. First, we introduce a novel feature selection algorithm, where the temporal dynamics are decoupled from the static learning procedure by assuming that the sequential data are piecewise independent and identically distributed. Second, we show that the SBHMM consistently improves traditional HMM recognition in various domains. The reduction of error compared to traditional HMMs ranges from 17% to 70% in American Sign Language recognition, human gait identification, lip reading, and speech recognition.

AddThis Social Bookmark Button

Paper: Asilomar Conference (2003) “Boosted audio-visual HMM for speech reading”

November 9th, 2003 Irfan Essa Posted in 0205507, Face and Gesture, Funding, James Rehg, Numerical Machine Learning, Papers, Pei Yin No Comments »

Yin, P. Essa, I. Rehg, J.M. (2003) “Boosted audio-visual HMM for speech reading.” In Proceedings Thirty-Seventh Asilomar Conference on Signals, Systems and Computers, 2003. Date: 9-12 Nov. 2003, Volume: 2, On page(s): 2013 – 2018 Vol.2, , ISBN: 0-7803-8104-1, INSPEC Accession Number:8555396, Digital Object Identifier: 10.1109/ACSSC.2003.1292334

Abstract

We propose a new approach for combining acoustic and visual measurements to aid in recognizing lip shapes of a person speaking. Our method relies on computing the maximum likelihoods of (a) HMM used to model phonemes from the acoustic signal, and (b) HMM used to model visual features motions from video. One significant addition in this work is the dynamic analysis with features selected by AdaBoost, on the basis of their discriminant ability. This form of integration, leading to boosted HMM, permits AdaBoost to find the best features first, and then uses HMM to exploit dynamic information inherent in the signal.

AddThis Social Bookmark Button

Paper: AI Magazine (1999) “Computers Seeing People”

July 14th, 1999 Irfan Essa Posted in Aware Home, Face and Gesture, Intelligent Environments, Papers No Comments »

Irfan A. Essa “Computers Seeing People”, AI Magazine 20(2): Summer 1999, 69-82

Abstract

AI researchers are interested in building intelligent machines that can interact with them as they interact with each other. Science fiction writers have given us these goals in the form of HAL in 2001: A Space Odyssey and Commander Data in Star Trek: The Next Generation. However, at present, our computers are deaf, dumb, and blind, almost unaware of the environment they are in and of the user who interacts with them. In this article, I present the current state of the art in machines that can see people, recognize them, determine their gaze, understand their facial expressions and hand gestures, and interpret their activities. I believe that by building machines with such abilities for perceiving, people will take us one step closer to building HAL and Commander Data.

AddThis Social Bookmark Button

Paper: IEEE PAMI (1997) “Coding, analysis, interpretation, and recognition of facial expressions”

July 14th, 1997 Irfan Essa Posted in Affective Computing, Face and Gesture, PAMI/ICCV/CVPR/ECCV, Papers, Research, Sandy Pentland No Comments »

Coding, analysis, interpretation, and recognition of facial expressions

Essa, I.A. Pentland, A.P. In IEEE Transactions on Pattern Analysis and Machine Intelligence, July 1997, Volume: 19 , Issue: 7, pp 757 – 763, ISSN: 0162-8828, CODEN: ITPIDJ. INSPEC Accession Number:5661539
Digital Object Identifier: 10.1109/34.598232

Abstract

We describe a computer vision system for observing facial motion by using an optimal estimation optical flow method coupled with geometric, physical and motion-based dynamic models describing the facial structure. Our method produces a reliable parametric representation of the face’s independent muscle action groups, as well as an accurate estimate of facial motion. Previous efforts at analysis of facial expression have been based on the facial action coding system (FACS), a representation developed in order to allow human psychologists to code expression from static pictures. To avoid use of this heuristic coding scheme, we have used our computer vision system to probabilistically characterize facial motion and muscle activation in an experimental population, thus deriving a new, more accurate, representation of human facial expressions that we call FACS . Finally, we show how this method can be used for coding, analysis, interpretation, and recognition of facial expressions

AddThis Social Bookmark Button

NYT Science Times (1997): “Laugh and Your Computer Will Laugh With You, Someday”

January 7th, 1997 Irfan Essa Posted in Face and Gesture, In The News, Research No Comments »

Daniel Goldman (1997 “Laugh and Your Computer Will Laugh With You, Someday” Jan 7, 1997 Issue, New York Times, Science Times

Quote from the article: “Emotions like fear, sadness and anger each announce themselves through a unique signature of changes in facial muscle, vocal inflection, physiological arousal, and other such cues. Building on techniques of pattern recognition already used for computer comprehension of words and images, Dr. Irfan Essa, a computer scientist at Georgia Tech, has constructed a computer system that can read people’s emotions from changes in their facial expression.”

This image was used in the article.

AddThis Social Bookmark Button

Paper: IEEE PAMI (1996) “Task-specific gesture analysis in real-time using interpolated views”

December 14th, 1996 Irfan Essa Posted in Activity Recognition, Face and Gesture, PAMI/ICCV/CVPR/ECCV, Papers, Research, Sandy Pentland No Comments »

Darrell, T.J.; Essa, I.A.; Pentland, A.P., “Task-specific gesture analysis in real-time using interpolated views” Transactions on Pattern Analysis and Machine Intelligence , vol.18, no.12, pp.1236-1242, Dec 1996
URL: [ieeexplore.ieee.org] [DOI]

Abstract

Hand and face gestures are modeled using an appearance-based approach in which patterns are represented as a vector of similarity scores to a set of view models defined in space and time. These view models are learned from examples using unsupervised clustering techniques. A supervised teaming paradigm is then used to interpolate view scores into a task-dependent coordinate system appropriate for recognition and control tasks. We apply this analysis to the problem of context-specific gesture interpolation and recognition, and demonstrate real-time systems which perform these tasks

AddThis Social Bookmark Button

Event: International Conference on Face and Gesture Recognition (1996).

October 13th, 1996 Irfan Essa Posted in Events, Face and Gesture, Sandy Pentland No Comments »

International Conference on Face and Gesture Recognition (FG) 1996, October 13-16, 1996, Killington, Vermont

fg96yptitle.gif

AddThis Social Bookmark Button

Paper: ICPR (1996): “Motion regularization for model-based head tracking”

August 25th, 1996 Irfan Essa Posted in Face and Gesture, Intelligent Environments, PAMI/ICCV/CVPR/ECCV, Sumit Basu No Comments »

S. Basu, I. Essa, A. Pentland (1996) “Motion regularization for model-based head tracking.” In Proceedings of  Proceedings of the 13th International Conference on Pattern Recognition, 1996., 25-29 Aug 1996 Volume: 3, page(s): 611-616. [ DOI | PDF]

Abstract

This paper describes a method for the robust tracking of rigid head motion from video. This method uses a 3D ellipsoidal model of the head and interprets the optical flow in terms of the possible rigid motions of the model. This method is robust to large angular and translational motions of the head and is not subject to the singularities of a 2D model. The method has been successfully applied to heads with a variety of shapes, hair styles, etc. This method also has the advantage of accurately capturing the 3D motion parameters of the head. This accuracy is shown through comparison with a ground truth synthetic sequence (a rendered 3D animation of a model head). In addition, the ellipsoidal model is robust to small variations in the initial fit, enabling the automation of the model initialization. Lastly, due to its consideration of the entire 3D aspect of the head, the tracking is very stable over a large number of frames. This robustness extends even to sequences with very low frame rates and noisy camera images

AddThis Social Bookmark Button