MENU: Home Bio Affiliations Research Teaching Publications Videos Collaborators/Students Contact FAQ ©2007-14 RSS

PhD Thesis by Zahoor Zafrulla “Automatic recognition of American Sign Language Classifiers

May 2nd, 2014 Irfan Essa Posted in Affective Computing, Behavioral Imaging, Face and Gesture, PhD, Thad Starner, Zahoor Zafrulla No Comments »

Title: Automatic recognition of American Sign Language Classifiers

Zahoor Zafrulla
School of Interactive Computing
College of Computing
Georgia Institute of Technology
http://www.cc.gatech.edu/grads/z/zahoor/

Committee:

Dr. Thad Starner (Advisor, School of Interactive Computing, Georgia Tech)
Dr. Irfan Essa (Co-Advisor, School of Interactive Computing, Georgia Tech)
Dr. Jim Rehg (School of Interactive Computing, Georgia Tech)
Dr. Harley Hamilton (School of Interactive Computing, Georgia Tech)
Dr. Vassilis Athitsos (Computer Science and Engineering Department, University of Texas at Arlington)

Summary:

Automatically recognizing classifier-based grammatical structures of American Sign Language (ASL) is a challenging problem. Classifiers in ASL utilize surrogate hand shapes for people or “classes” of objects and provide information about their location, movement and appearance. In the past researchers have focused on recognition of finger spelling, isolated signs, facial expressions and interrogative words like WH-questions (e.g. Who, What, Where, and When). Challenging problems such as recognition of ASL sentences and classifier-based grammatical structures remain relatively unexplored in the field of ASL recognition.

One application of recognition of classifiers is toward creating educational games to help young deaf children acquire language skills. Previous work developed CopyCat, an educational ASL game that requires children to engage in a progressively more difficult expressive signing task as they advance through the game.

We have shown that by leveraging context we can use verification, in place of recognition, to boost machine performance for determining if the signed responses in an expressive signing task, like in the CopyCat game, are correct or incorrect. We have demonstrated that the quality of a machine verifier’s ability to identify the boundary of the signs can be improved by using a novel two-pass technique that combines signed input in both forward and reverse directions. Additionally, we have shown that we can reduce CopyCat’s dependency on custom manufactured hardware by using an off-the-shelf Microsoft Kinect depth camera to achieve similar verification performance. Finally, we show how we can extend our ability to recognize sign language by leveraging depth maps to develop a method using improved hand detection and hand shape classification to recognize selected classifier-based grammatical structures of ASL.

AddThis Social Bookmark Button

Paper in IBSI 2014 conference entitled “Automated Surgical OSATS Prediction from Videos”

April 28th, 2014 Irfan Essa Posted in Behavioral Imaging, Health Systems, Medical, Papers, Thomas Ploetz, Yachna Sharma No Comments »

  • Y. Sharma, T. Ploetz, N. Hammerla, S. Mellor, R. McNaney, P. Oliver, S. Deshmukh, A. McCaskie, and I. Essa (2014), “Automated Surgical OSATS Prediction from Videos,” in Proceedings of IEEE International Symposium on Biomedical Imaging, Beijing, CHINA, 2014. [PDF] [BIBTEX]
    @inproceedings{2014-Sharma-ASOPFV,
      Abstract = {The assessment of surgical skills is an essential part of medical training. The prevalent manual evaluations by expert surgeons are time consuming and often their outcomes vary substantially from one observer to another. We present a video-based framework for automated evaluation of surgical skills based on the Objective Structured Assessment of Technical Skills (OSATS) criteria. We encode the motion dynamics via frame kernel matrices, and represent the motion granularity by texture features. Linear discriminant analysis is used to derive a reduced dimensionality feature space followed by linear regression to predict OSATS skill scores. We achieve statistically significant correlation (p-value < 0.01) between the ground-truth (given by domain experts) and the OSATS scores predicted by our framework.},
      Address = {Beijing, CHINA},
      Author = {Yachna Sharma and Thomas Ploetz and Nils Hammerla and Sebastian Mellor and Roisin McNaney and Patrick Oliver and Sandeep Deshmukh and Andrew McCaskie and Irfan Essa},
      Booktitle = {{Proceedings of IEEE International Symposium on Biomedical Imaging}},
      Date-Added = {2014-04-28 16:51:07 +0000},
      Date-Modified = {2014-04-28 17:07:29 +0000},
      Month = {April},
      Pdf = {http://www.cc.gatech.edu/~irfan/p/2014-Sharma-ASOPFV.pdf},
      Title = {Automated Surgical {OSATS} Prediction from Videos},
      Year = {2014}}

Abstract

The assessment of surgical skills is an essential part of medical training. The prevalent manual evaluations by expert surgeons are time consuming and often their outcomes vary substantially from one observer to another. We present a video-based framework for automated evaluation of surgical skills based on the Objective Structured Assessment of Technical Skills (OSATS) criteria. We encode the motion dynamics via frame kernel matrices, and represent the motion granularity by texture features. Linear discriminant analysis is used to derive a reduced dimensionality feature space followed by linear regression to predict OSATS skill scores. We achieve statistically significant correlation (p-value < 0.01) between the ground-truth (given by domain experts) and the OSATS scores predicted by our framework.

AddThis Social Bookmark Button

Paper in IEEE CVPR 2013 “Decoding Children’s Social Behavior”

June 27th, 2013 Irfan Essa Posted in Affective Computing, Behavioral Imaging, Denis Lantsman, Gregory Abowd, James Rehg, PAMI/ICCV/CVPR/ECCV, Papers, Thomas Ploetz No Comments »

  • J. M. Rehg, G. D. Abowd, A. Rozga, M. Romero, M. A. Clements, S. Sclaroff, I. Essa, O. Y. Ousley, Y. Li, C. Kim, H. Rao, J. C. Kim, L. L. Presti, J. Zhang, D. Lantsman, J. Bidwell, and Z. Ye (2013), “Decoding Children’s Social Behavior,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013. [PDF] [WEBSITE] [DOI] [BIBTEX]
    @inproceedings{2013-Rehg-DCSB,
      Author = {James M. Rehg and Gregory D. Abowd and Agata Rozga and Mario Romero and Mark A. Clements and Stan Sclaroff and Irfan Essa and Opal Y. Ousley and Yin Li and Chanho Kim and Hrishikesh Rao and Jonathan C. Kim and Liliana Lo Presti and Jianming Zhang and Denis Lantsman and Jonathan Bidwell and Zhefan Ye},
      Booktitle = {{Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}},
      Date-Added = {2013-06-25 11:47:42 +0000},
      Date-Modified = {2014-04-28 17:08:51 +0000},
      Doi = {10.1109/CVPR.2013.438},
      Month = {June},
      Organization = {IEEE Computer Society},
      Pdf = {http://www.cc.gatech.edu/~rehg/Papers/Rehg_CVPR13.pdf},
      Title = {Decoding Children's Social Behavior},
      Url = {http://www.cbi.gatech.edu/mmdb/},
      Year = {2013},
      Bdsk-Url-1 = {http://www.cbi.gatech.edu/mmdb/},
      Bdsk-Url-2 = {http://dx.doi.org/10.1109/CVPR.2013.438}}

Abstract

We introduce a new problem domain for activity recognition: the analysis of children’s social and communicative behaviors based on video and audio data. We specifically target interactions between children aged 1-2 years and an adult. Such interactions arise naturally in the diagnosis and treatment of developmental disorders such as autism. We introduce a new publicly-available dataset containing over 160 sessions of a 3-5 minute child-adult interaction. In each session, the adult examiner followed a semi-structured play interaction protocol which was designed to elicit a broad range of social behaviors. We identify the key technical challenges in analyzing these behaviors, and describe methods for decoding the interactions. We present experimental results that demonstrate the potential of the dataset to drive interesting research questions, and show preliminary results for multi-modal activity recognition.

Full database available from http://www.cbi.gatech.edu/mmdb/

via IEEE Xplore – Decoding Children’s Social Behavior.

AddThis Social Bookmark Button

Paper in IEEE CVPR 2013 “Augmenting Bag-of-Words: Data-Driven Discovery of Temporal and Structural Information for Activity Recognition”

June 27th, 2013 Irfan Essa Posted in Activity Recognition, Behavioral Imaging, Grant Schindler, PAMI/ICCV/CVPR/ECCV, Papers, Sports Visualization, Thomas Ploetz, Vinay Bettadapura No Comments »

  • V. Bettadapura, G. Schindler, T. Ploetz, and I. Essa (2013), “Augmenting Bag-of-Words: Data-Driven Discovery of Temporal and Structural Information for Activity Recognition,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013. [PDF] [WEBSITE] [DOI] [BIBTEX]
    @inproceedings{2013-Bettadapura-ABDDTSIAR,
      Author = {Vinay Bettadapura and Grant Schindler and Thomas Ploetz and Irfan Essa},
      Booktitle = {{Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}},
      Date-Added = {2013-06-25 11:42:31 +0000},
      Date-Modified = {2014-04-28 17:10:00 +0000},
      Doi = {10.1109/CVPR.2013.338},
      Month = {June},
      Organization = {IEEE Computer Society},
      Pdf = {http://www.cc.gatech.edu/~irfan/p/2013-Bettadapura-ABDDTSIAR.pdf},
      Title = {Augmenting Bag-of-Words: Data-Driven Discovery of Temporal and Structural Information for Activity Recognition},
      Url = {http://www.cc.gatech.edu/cpl/projects/abow/},
      Year = {2013},
      Bdsk-Url-1 = {http://www.cc.gatech.edu/cpl/projects/abow/},
      Bdsk-Url-2 = {http://dx.doi.org/10.1109/CVPR.2013.338}}

Abstract

We present data-driven techniques to augment Bag of Words (BoW) models, which allow for more robust modeling and recognition of complex long-term activities, especially when the structure and topology of the activities are not known a priori. Our approach specifically addresses the limitations of standard BoW approaches, which fail to represent the underlying temporal and causal information that is inherent in activity streams. In addition, we also propose the use of randomly sampled regular expressions to discover and encode patterns in activities. We demonstrate the effectiveness of our approach in experimental evaluations where we successfully recognize activities and detect anomalies in four complex datasets.

via IEEE Xplore – Augmenting Bag-of-Words: Data-Driven Discovery of Temporal and Structural Information for Activity R….

AddThis Social Bookmark Button

Paper in AISTATS 2013 “Beyond Sentiment: The Manifold of Human Emotions”

April 29th, 2013 Irfan Essa Posted in AAAI/IJCAI/UAI, Behavioral Imaging, Computational Journalism, Numerical Machine Learning, Papers, WWW No Comments »

  • S. Kim, F. Li, G. Lebanon, and I. A. Essa (2013), “Beyond Sentiment: The Manifold of Human Emotions,” in Proceedings of AI STATS, 2013. [PDF] [BIBTEX]
    @inproceedings{2012-Kim-BSMHE,
      Author = {Seungyeon Kim and Fuxin Li and Guy Lebanon and Irfan A. Essa},
      Booktitle = {Proceedings of AI STATS},
      Date-Added = {2013-06-25 12:01:11 +0000},
      Date-Modified = {2013-06-25 12:02:53 +0000},
      Pdf = {http://arxiv.org/pdf/1202.1568v1},
      Title = {Beyond Sentiment: The Manifold of Human Emotions},
      Year = {2013}}

Abstract

Sentiment analysis predicts the presence of positive or negative emotions in a text document. In this paper we consider higher dimensional extensions of the sentiment concept, which represent a richer set of human emotions. Our approach goes beyond previous work in that our model contains a continuous manifold rather than a finite set of human emotions. We investigate the resulting model, compare it to psychological observations, and explore its predictive capabilities. Besides obtaining significant improvements over a baseline without manifold, we are also able to visualize different notions of positive sentiment in different domains.

via [arXiv.org 1202.1568] Beyond Sentiment: The Manifold of Human Emotions.

AddThis Social Bookmark Button