MENU: Home Bio Affiliations Research Teaching Publications Videos Collaborators/Students Contact FAQ ©2007-14 RSS

Paper in IEEE CVPR 2013 “Decoding Children’s Social Behavior”

June 27th, 2013 Irfan Essa Posted in Affective Computing, Behavioral Imaging, Denis Lantsman, Gregory Abowd, James Rehg, PAMI/ICCV/CVPR/ECCV, Papers, Thomas Ploetz No Comments »

  • J. M. Rehg, G. D. Abowd, A. Rozga, M. Romero, M. A. Clements, S. Sclaroff, I. Essa, O. Y. Ousley, Y. Li, C. Kim, H. Rao, J. C. Kim, L. L. Presti, J. Zhang, D. Lantsman, J. Bidwell, and Z. Ye (2013), “Decoding Children’s Social Behavior,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013. [PDF] [WEBSITE] [DOI] [BIBTEX]
    @inproceedings{2013-Rehg-DCSB,
      Author = {James M. Rehg and Gregory D. Abowd and Agata Rozga and Mario Romero and Mark A. Clements and Stan Sclaroff and Irfan Essa and Opal Y. Ousley and Yin Li and Chanho Kim and Hrishikesh Rao and Jonathan C. Kim and Liliana Lo Presti and Jianming Zhang and Denis Lantsman and Jonathan Bidwell and Zhefan Ye},
      Booktitle = {Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
      Date-Added = {2013-06-25 11:47:42 +0000},
      Date-Modified = {2013-10-22 18:50:31 +0000},
      Doi = {10.1109/CVPR.2013.438},
      Month = {June},
      Organization = {IEEE Computer Society},
      Pdf = {http://www.cc.gatech.edu/~rehg/Papers/Rehg_CVPR13.pdf},
      Title = {Decoding Children's Social Behavior},
      Url = {http://www.cbi.gatech.edu/mmdb/},
      Year = {2013},
      Bdsk-Url-1 = {http://www.cbi.gatech.edu/mmdb/},
      Bdsk-Url-2 = {http://dx.doi.org/10.1109/CVPR.2013.438}}

Abstract

We introduce a new problem domain for activity recognition: the analysis of children’s social and communicative behaviors based on video and audio data. We specifically target interactions between children aged 1-2 years and an adult. Such interactions arise naturally in the diagnosis and treatment of developmental disorders such as autism. We introduce a new publicly-available dataset containing over 160 sessions of a 3-5 minute child-adult interaction. In each session, the adult examiner followed a semi-structured play interaction protocol which was designed to elicit a broad range of social behaviors. We identify the key technical challenges in analyzing these behaviors, and describe methods for decoding the interactions. We present experimental results that demonstrate the potential of the dataset to drive interesting research questions, and show preliminary results for multi-modal activity recognition.

Full database available from http://www.cbi.gatech.edu/mmdb/

via IEEE Xplore – Decoding Children’s Social Behavior.

AddThis Social Bookmark Button

Paper: PUI (1997) “Prosody Analysis for Speaker Affect Determination”

October 12th, 1997 Irfan Essa Posted in Affective Computing, Papers No Comments »

Andrew Gardner and Irfan Essa (1997) “Prosody Analysis for Speaker Affect Determination” In Proceedings of Perceptual User Interfaces Workshop (PUI 1997), Banff, Alberta, CANADA, Oct 1997 [PDF][Project Site]

Abstract

Speech is a complex waveform containing verbal (e.g. phoneme, syllable, and word) and nonverbal (e.g. speaker identity, emotional state, and tone) information. Both the verbal and nonverbal aspects of speech are extremely important in interpersonal communication and human-machine interaction. However, work in machine perception of speech has focused primarily on the verbal, or content-oriented, goals of speech recognition, speech compression, and speech labeling. Usage of nonverbal information has been limited to speaker identification applications. While the success of research in these areas is well documented, this success is fundamentally limited by the effect of nonverbal information on the speech waveform. The extra-linguistic aspect of speech is considered a source of variability that theoretically can be minimized with an appropriate preprocessing technique; determination of such robust techniques is however, far from trivial.

It is widely believed in the speech processing community that the nonverbal component of speech contains higher-level information that provides cues for auditory scene analysis, speech understanding, and the determination of a speaker’s psychological state or conversational tone. We believe that the identification of such nonverbal cues can improve the performance of classic speech processing tasks and will be necessary for the realization of natural, robust human-computer speech interfaces. In this paper we seek to address the problem of how to systematically analyze the nonverbal aspect of the speech waveform to determine speaker affect, specifically by analyzing the pitch contour.

AddThis Social Bookmark Button

Paper: IEEE PAMI (1997) “Coding, analysis, interpretation, and recognition of facial expressions”

July 14th, 1997 Irfan Essa Posted in Affective Computing, Face and Gesture, PAMI/ICCV/CVPR/ECCV, Papers, Research, Sandy Pentland No Comments »

Coding, analysis, interpretation, and recognition of facial expressions

Essa, I.A. Pentland, A.P. In IEEE Transactions on Pattern Analysis and Machine Intelligence, July 1997, Volume: 19 , Issue: 7, pp 757 – 763, ISSN: 0162-8828, CODEN: ITPIDJ. INSPEC Accession Number:5661539
Digital Object Identifier: 10.1109/34.598232

Abstract

We describe a computer vision system for observing facial motion by using an optimal estimation optical flow method coupled with geometric, physical and motion-based dynamic models describing the facial structure. Our method produces a reliable parametric representation of the face’s independent muscle action groups, as well as an accurate estimate of facial motion. Previous efforts at analysis of facial expression have been based on the facial action coding system (FACS), a representation developed in order to allow human psychologists to code expression from static pictures. To avoid use of this heuristic coding scheme, we have used our computer vision system to probabilistically characterize facial motion and muscle activation in an experimental population, thus deriving a new, more accurate, representation of human facial expressions that we call FACS . Finally, we show how this method can be used for coding, analysis, interpretation, and recognition of facial expressions

AddThis Social Bookmark Button

Scientific American Article (1996): “Smart Rooms; by Alex Pentland

April 9th, 1996 Irfan Essa Posted in Affective Computing, Face and Gesture, In The News, Intelligent Environments, Research No Comments »

Alex Pentland (1996), “Smart Rooms”Scientific American, April 1996

Quote from the Article: “Facial expression is almost as important as identity. A teaching program, for example, should know if its students look bored. So once our smart room has found and identified someone’s face, it analyzes the expression. Yet another computer compares the facial motion the camera records with maps depicting the facial motions involved in making various expressions. Each expression, in fact, involves a unique collection of muscle movements. When you smile, you curl the corners of your mouth and lift certain parts of your forehead; when you fake a smile, though, you move only your mouth. In experiments conducted by scientist Irfan A. Essa and me, our system has correctly judged expressions-among a small group of subjects-98 percent of the time.”

AddThis Social Bookmark Button

Discover Magazine Article (1995) “A Face of Ones Own Memory, Emotions, Decisions”

December 1st, 1995 Irfan Essa Posted in Affective Computing, Face and Gesture, In The News, Research No Comments »

Evan I. Schwartz (1995) “A Face of One’s Own | Memory, Emotions, & Decisions”, DISCOVER MagazineDecember 1, 1995.

Quote from the Article: “Chief among the members of his staff working on the problem is computer scientist Irfan Essa. To get computers to read facial expressions such as happiness or anger, Essa has designed three-dimensional animated models of common facial movements. His animated faces move according to biomedical data gathered from facial surgeons and anatomists. Essa uses this information to simulate exactly what happens when a person’s static, expressionless face, whose muscles are completely relaxed and free of stress, breaks out into a laugh or a frown or some other expression of emotion.”

AddThis Social Bookmark Button