<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Irfan Essa&#039;s Academic Activities &#187; Computer Vision</title>
	<atom:link href="http://prof.irfanessa.com/tag/computer-vision/feed/" rel="self" type="application/rss+xml" />
	<link>http://prof.irfanessa.com</link>
	<description>Academic/Professional Activities</description>
	<lastBuildDate>Thu, 01 Apr 2010 15:31:12 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=abc</generator>
		<item>
		<title>Paper Advanced Robotics (2009): &#8220;Human Action Recognition Using Global Point Feature Histograms and Action Shapes&#8221;</title>
		<link>http://prof.irfanessa.com/2009/10/29/paper-advanced-robotics-2009/</link>
		<comments>http://prof.irfanessa.com/2009/10/29/paper-advanced-robotics-2009/#comments</comments>
		<pubDate>Thu, 29 Oct 2009 14:58:56 +0000</pubDate>
		<dc:creator>Irfan Essa</dc:creator>
				<category><![CDATA[Activity Recognition]]></category>
		<category><![CDATA[Franzi Meier]]></category>
		<category><![CDATA[Intelligent Environments]]></category>
		<category><![CDATA[Michael Beetz]]></category>
		<category><![CDATA[Papers]]></category>
		<category><![CDATA[2009]]></category>
		<category><![CDATA[Computer Vision]]></category>

		<guid isPermaLink="false">http://prof.irfanessa.com/?p=622</guid>
		<description><![CDATA[Radu Bogdan Rusu, Jan Bandouch, Franziska Meier, Irfan Essa and Michael Beetz (2009) &#8220;Human Action Recognition Using Global Point Feature Histograms and Action Shapes&#8221;, in Journal of Advanced Robotics, volume 23, pages 1873–1908, Koninklijke Brill NV, Leiden and The Robotics Society of Japan, 2009. [ DOI &#124; PDF] Abstract This paper investigates the recognition of [...]]]></description>
			<content:encoded><![CDATA[<p>Radu Bogdan Rusu,  Jan Bandouch, Franziska Meier,  Irfan Essa and Michael Beetz (2009) &#8220;Human Action Recognition Using Global Point Feature Histograms and Action Shapes&#8221;, in Journal of Advanced Robotics, volume 23, pages 1873–1908, Koninklijke Brill NV, Leiden and The Robotics Society of Japan, 2009. [ <a href="http://dx.doi.org/DOI:10.1163/016918609X12518783330243">DOI </a> | PDF]</p>
<p style="text-align: center;"><strong>Abstract</strong></p>
<p style="text-align: justify;">This paper investigates the recognition of human actions from three-dimensional (3-D) point clouds that encode the motions of people acting in sensor-distributed indoor environments. Data streams are time sequences of silhouettes extracted from cameras in the environment. From the 2-D silhouette contours we generate space–time streams by continuously aligning and stacking the contours along the time axis as third spatial dimension. The space–time stream of an observation sequence is segmented into parts corresponding to subactions using a pattern matching technique based on suffix trees and interval scheduling. Then, the segmented space–time shapes are processed by treating the shapes as 3-D point clouds and estimating global point feature histograms for them. The resultant models are clustered using statistical analysis and our experimental results indicate that the presented methods robustly derive different action classes. This holds despite large intra-class variance in the recorded datasets due to performances from different persons at different time intervals.</p>
<p style="text-align: justify;">© Koninklijke Brill NV, Leiden and The Robotics Society of Japan, 2009</p>
<p style="text-align: justify;">
<div id="attachment_625" class="wp-caption aligncenter" style="width: 512px"><img class="size-large wp-image-625  " title="2009-Rusu-etal-AR23-B" src="http://prof.irfanessa.com/wp-content/uploads/2009/10/2009-Rusu-etal-AR23-B-1024x193.png" alt="Overview of the approach." width="502" height="95" /><p class="wp-caption-text">Overview of the approach.</p></div>
<p><strong>Keywords: </strong>Action recognition, point cloud, global features, action segmentation</p>
]]></content:encoded>
			<wfw:commentRss>http://prof.irfanessa.com/2009/10/29/paper-advanced-robotics-2009/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Paper ISMAR 2009 (IEEE International Symposium on Mixed and Augmented Reality): &#8220;Augmenting Aerial Earth Maps with Dynamic Information&#8221;</title>
		<link>http://prof.irfanessa.com/2009/10/20/paper-2009-in-ismar-ieee-international-symposium-on-mixed-and-augmented-reality-augmenting-aerial-earth-maps-with-dynamic-information/</link>
		<comments>http://prof.irfanessa.com/2009/10/20/paper-2009-in-ismar-ieee-international-symposium-on-mixed-and-augmented-reality-augmenting-aerial-earth-maps-with-dynamic-information/#comments</comments>
		<pubDate>Tue, 20 Oct 2009 23:07:42 +0000</pubDate>
		<dc:creator>Irfan Essa</dc:creator>
				<category><![CDATA[Computational Journalism]]></category>
		<category><![CDATA[Computational Photography and Video]]></category>
		<category><![CDATA[Kihwan Kim]]></category>
		<category><![CDATA[Modeling and Animation]]></category>
		<category><![CDATA[Papers]]></category>
		<category><![CDATA[2009]]></category>
		<category><![CDATA[Animation]]></category>
		<category><![CDATA[CnJ]]></category>
		<category><![CDATA[Computer Vision]]></category>
		<category><![CDATA[Crowdsourcing]]></category>

		<guid isPermaLink="false">http://prof.irfanessa.com/?p=556</guid>
		<description><![CDATA[Kihwan Kim, Sangmin Oh, Jeonggyu Lee and Irfan Essa (2009), &#8220;Augmenting Aerial Earth Maps with Dynamic Information,&#8221; In Proceedings of IEEE International Symposium on Mixed and Augmented Reality (ISMAR), Orlando, FL, USA, October 2009 [Project Site, Video (AVI/DiVX), Video (Youtube) Paper (pdf)]. Abstract We introduce methods for augmenting aerial visualizations of Earth (from tools such [...]]]></description>
			<content:encoded><![CDATA[<p style="text-align: justify;">Kihwan Kim, Sangmin Oh, Jeonggyu Lee and Irfan Essa (2009), &#8220;Augmenting Aerial Earth Maps with Dynamic Information,&#8221; In Proceedings of <em>IEEE International Symposium on Mixed and Augmented Reality (ISMAR), </em>Orlando, FL, USA, October 2009 [<a href="http://www.cc.gatech.edu/cpl/projects/augearth/" target="_blank">Project Site</a>, <a href="http://www.kihwan23.com/augearth/augearth_ismar09_kim.avi">Video (AVI/DiVX)</a>, <a href="&lt;http://www.youtube.com/v/TPhttp://www.youtube.com/watch?v=TPk88soc2qw">Video (Youtube)</a><a href="http://www.cc.gatech.edu/cpl/projects/augearth/augearth_ismar_reduce.pdf"> Paper (pdf)</a>].</p>
<p style="text-align: center;"><strong>Abstract</strong></p>
<p style="text-align: justify;">We introduce methods for augmenting aerial visualizations of Earth (from tools such as Google Earth or Microsoft Virtual Earth) with dynamic information obtained from videos. Our goal is to make Augmented Earth Maps that visualize the live broadcast of dynamic sceneries within a city. We propose different approaches to analyze videos of pedestrians and cars, under differing conditions and then augment Aerial Earth Maps (AEMs) with live and dynamic information. We also analyze natural phenomenon (clouds) and project information from these to the AEMs to add the visual reality.</p>
<p><object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="425" height="344" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowScriptAccess" value="always" /><param name="src" value="http://www.youtube.com/v/TPk88soc2qw&amp;color1=0xb1b1b1&amp;color2=0xcfcfcf&amp;hl=en&amp;feature=player_embedded&amp;fs=1" /><param name="allowfullscreen" value="true" /><embed type="application/x-shockwave-flash" width="425" height="344" src="http://www.youtube.com/v/TPk88soc2qw&amp;color1=0xb1b1b1&amp;color2=0xcfcfcf&amp;hl=en&amp;feature=player_embedded&amp;fs=1" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
]]></content:encoded>
			<wfw:commentRss>http://prof.irfanessa.com/2009/10/20/paper-2009-in-ismar-ieee-international-symposium-on-mixed-and-augmented-reality-augmenting-aerial-earth-maps-with-dynamic-information/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
<enclosure url="http://www.kihwan23.com/augearth/augearth_ismar09_kim.avi" length="81762078" type="video/x-msvideo" />
		</item>
		<item>
		<title>Event (2009): IEEE Workshop on Computer Vision for Humanoid Robots in Real Environment</title>
		<link>http://prof.irfanessa.com/2009/09/23/event-2009-ieee-workshop-on-computer-vision-for-humanoid-robots-in-real-environment/</link>
		<comments>http://prof.irfanessa.com/2009/09/23/event-2009-ieee-workshop-on-computer-vision-for-humanoid-robots-in-real-environment/#comments</comments>
		<pubDate>Wed, 23 Sep 2009 15:30:28 +0000</pubDate>
		<dc:creator>Irfan Essa</dc:creator>
				<category><![CDATA[Events]]></category>
		<category><![CDATA[2009]]></category>
		<category><![CDATA[Computer Vision]]></category>
		<category><![CDATA[Humanoids]]></category>
		<category><![CDATA[Robotics]]></category>

		<guid isPermaLink="false">http://prof.irfanessa.com/?p=560</guid>
		<description><![CDATA[IEEE Workshop on Computer Vision for Humanoid Robots in Real Environments. I am co-organizing the First IEEE Workshop on Computer Vision for Humanoids in conjunction with ICCV Conference in Kyoto, Japan.  This workshop will be held September 27, 2009. (9:30am &#8211; 6:00pm). The goal of this workshop is to bring together experts from the fields [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://humanoidscv.ime.cmc.osaka-u.ac.jp/">IEEE Workshop on Computer Vision for Humanoid Robots in Real Environments</a>.</p>
<p>I am co-organizing the First IEEE Workshop on Computer Vision for Humanoids in conjunction with ICCV Conference in Kyoto, Japan.  This workshop will be held September 27, 2009. (9:30am &#8211; 6:00pm).</p>
<p style="line-height: 20px;">The goal of this workshop is to bring together experts from the fields of computer vision and robotics that are working on humanoid robots with vision as one of the primary modalities. Topics of interest include and are not limited to:</p>
<ul class="list1" style="margin-top: 0.5em; margin-bottom: 0.5em; line-height: 20px; list-style-type: disc; padding-left: 16px; margin-left: 16px;">
<li>Visual Learning in Robots</li>
<li>Human Robot Interaction</li>
<li>Grasping and Manipulation</li>
<li>Learning by Demonstration</li>
<li>Task Learning for Robots</li>
<li>Activity Recognition and Discovery for Robot</li>
<li>Humanoid Navigation in Real Environments</li>
<li>Vision Devices and Systems for Robot Applications</li>
<li>Application of Humanoid Robots (Indoor/Outdoor, Entertainment)</li>
</ul>
<p style="line-height: 20px;">This is the first attempt at a workshop that crosses from Humanoids Research to Computer Vision Research.</p>
<p style="line-height: 20px;">The workshop includes six invited talks as well as an open poster session, where all participants are expected to present a poster describing their recent work.</p>
<p>Location: Kyoto University, Faculty of Engineering Bldg.#3, 2F, Room W201, in conjunction with ICCV. (See http://www.iccv2009.org/workshops/index.html).</p>
<p>For schedule, abstracts and other information, see the workshop <a href="http://humanoidscv.ime.cmc.osaka-u.ac.jp/">website</a>. More information about ICCV at http://www.iccv2009.org/.</p>
<div id="attachment_573" class="wp-caption aligncenter" style="width: 310px"><img class="size-medium wp-image-573" title="20090927_1491" src="http://prof.irfanessa.com/wp-content/uploads/2009/09/20090927_1491-300x168.jpg" alt="Invited Speakers and Organizers after the Workshop" width="300" height="168" /><p class="wp-caption-text">Invited Speakers and Organizers after the Workshop</p></div>
]]></content:encoded>
			<wfw:commentRss>http://prof.irfanessa.com/2009/09/23/event-2009-ieee-workshop-on-computer-vision-for-humanoid-robots-in-real-environment/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Presentation at International Workshop on Video (2009): &#8220;Temporal Representations of Video for Analysis and Synthesis&#8221;</title>
		<link>http://prof.irfanessa.com/2009/05/26/presentation-at-international-workshop-on-video-2009-temporal-representations-of-video-for-analysis-and-synthesis/</link>
		<comments>http://prof.irfanessa.com/2009/05/26/presentation-at-international-workshop-on-video-2009-temporal-representations-of-video-for-analysis-and-synthesis/#comments</comments>
		<pubDate>Tue, 26 May 2009 14:37:32 +0000</pubDate>
		<dc:creator>Irfan Essa</dc:creator>
				<category><![CDATA[Computational Photography and Video]]></category>
		<category><![CDATA[Presentations]]></category>
		<category><![CDATA[2009]]></category>
		<category><![CDATA[Computer Vision]]></category>

		<guid isPermaLink="false">http://academics.irfanessa.com/?p=517</guid>
		<description><![CDATA[&#8220;Temporal Representations of Video for Analysis and Synthesis&#8221; at IWV09: International Workshop on Video, In Barcelona, SPAIN, May 25-27, 2009. (Slides, NO Video) Abstract I will present a variety of temporal models of video that we have been studying (and developing on) for analysis and synthesis of video. Forsynthesis of videos, we have been developing representations [...]]]></description>
			<content:encoded><![CDATA[<h3 style="text-align: center;">&#8220;Temporal Representations of Video for Analysis and Synthesis&#8221; at <a href="http://research.microsoft.com/en-us/um/india/events/iwv09/index.html">IWV09: International Workshop on Video</a>, In Barcelona, SPAIN, May 25-27, 2009.</h3>
<p style="text-align: center;">(<a href="http://www.cc.gatech.edu/~irfan/presentations/2009/IWV2009-Barcelona.html" target="_blank">Slides, NO Video</a>)</p>
<p style="text-align: center;"><strong>Abstract</strong></p>
<p style="text-align: justify;">I will present a variety of temporal models of video that we have been studying (and developing on) for analysis and synthesis of video. Forsynthesis of videos, we have been developing representations that support example-based re-synthesis and spatio-temporal re-targeting. These approaches build on graph-based methods and we present techniques for similarity metrics for video, segmentation in video, and merging of different video streams. I will showcase a series of examples of these approaches applied to generate new videos.</p>
<p style="text-align: justify;">For analysis of videos, we have developed a series of representations to observe and model activities in videos. Building on low-level measures of movement and motion in videos, we have incorporated higher-level temporal generative models to represent and recognize observed activities. I will discuss the strengths of a variety of State-based, Markovian, Grammar-based and Network-based representations that we have employed for recognizing activities from video. I will also discuss approaches for unsupervised discovery and recognition of activities.</p>
<p style="text-align: justify;">Time permitting, I will describe some new efforts that move towards understanding mobile imaging and video, and video authoring and video on the web, Within these I will discuss issues of collaborative imaging, collective authoring, ad-hoc sensor networks, and peer production with images and videos. Using these concepts, to focus the conversation, I will discuss how all of these issues are impacting the field Journalism and Reporting and how we have started on a new interdisciplinary research and education effort, we call Computational Journalism.</p>
<p style="text-align: justify;"><img class="aligncenter" src="http://research.microsoft.com/en-us/um/india/events/iwv09/header.jpg" alt="" width="520" height="100" /></p>
]]></content:encoded>
			<wfw:commentRss>http://prof.irfanessa.com/2009/05/26/presentation-at-international-workshop-on-video-2009-temporal-representations-of-video-for-analysis-and-synthesis/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Paper (2009) In ACM Symposium on Interactive 3D Graphics &#8220;Human Video Textures&#8221;</title>
		<link>http://prof.irfanessa.com/2009/03/01/paper-2009-acm-symposium-on-interactive-human-video-textures/</link>
		<comments>http://prof.irfanessa.com/2009/03/01/paper-2009-acm-symposium-on-interactive-human-video-textures/#comments</comments>
		<pubDate>Sun, 01 Mar 2009 19:43:45 +0000</pubDate>
		<dc:creator>Irfan Essa</dc:creator>
				<category><![CDATA[ACM SIGGRAPH]]></category>
		<category><![CDATA[Computational Photography and Video]]></category>
		<category><![CDATA[James Rehg]]></category>
		<category><![CDATA[Matt Flagg]]></category>
		<category><![CDATA[Modeling and Animation]]></category>
		<category><![CDATA[Papers]]></category>
		<category><![CDATA[Sing Bing Kang]]></category>
		<category><![CDATA[2009]]></category>
		<category><![CDATA[Animation]]></category>
		<category><![CDATA[Computer Vision]]></category>
		<category><![CDATA[Motion Capture]]></category>
		<category><![CDATA[Video Textures]]></category>

		<guid isPermaLink="false">http://academics.irfanessa.com/?p=473</guid>
		<description><![CDATA[Matthew Flagg, Atsushi Nakazawa, Qiushuang Zhang, Sing Bing Kang, Young Kee Ryu, Irfan Essa, James M. Rehg (2009), Human Video Textures In Proceedings of the ACM Symposium on Interactive 3D Graphics and Games 2009 (I3D ’09), Boston, MA, February 27-March 1 (Fri-Sun), 2009 [PDF (see Copyright) &#124; Video in DiVx &#124; Website ] Abstract This paper describes a data-driven approach for [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.cc.gatech.edu/cpl/projects/humanvideotextures/"></a></p>
<p><a href="http://www.cc.gatech.edu/~mflagg">Matthew Flagg</a>, <a href="http://www.ime.cmc.osaka-u.ac.jp/~nakazawa/wiki/">Atsushi Nakazawa</a>, Qiushuang Zhang, <a href="http://research.microsoft.com/en-us/people/sbkang/">Sing Bing Kang</a>, Young Kee Ryu, <a href="http://www.irfanessa.com/">Irfan Essa</a>, <a href="http://www.cc.gatech.edu/~rehg">James M. Rehg</a> (2009), <a href="http://www.cc.gatech.edu/cpl/projects/humanvideotextures/">Human Video Textures</a> In Proceedings of the ACM Symposium on Interactive 3D Graphics and Games 2009 (<a href="http://graphics.cs.williams.edu/i3d09/" target="_blank">I3D ’09</a>), Boston, MA, February 27-March 1 (Fri-Sun), 2009 [<a href="http://www.cc.gatech.edu/cpl/projects/humanvideotextures/HVT.pdf" target="_blank">PDF</a> (see <a href="./copyright" target="_blank">Copyright</a>) | <a href="http://www.cc.gatech.edu/cpl/projects/humanvideotextures/hvt-i3d.avi">Video</a> in DiVx | Website ]</p>
<tbody></tbody>
<p style="text-align: center;"><strong>Abstract</strong></p>
<p style="text-align: justify;">This paper describes a data-driven approach for generating photorealistic animations of human motion. Each animation sequence follows a user-choreographed path and plays continuously by seamlessly transitioning between different segments of the captured data. To produce these animations, we capitalize on the complementary characteristics of motion capture data and video. We customize our capture system to record motion capture data that are synchronized with our video source. Candidate transition points in video clips are identified using a new similarity metric based on 3-D marker trajectories and their 2-D projections into video. Once the transitions have been identified, a video-based motion graph is constructed. We further exploit hybrid motion and video data to ensure that the transitions are seamless when generating animations. Motion capture marker projections serve as control points for segmentation of layers and nonrigid transformation of regions. This allows warping and blending to generate seamless in-between frames for animation. We show a series of choreographed animations of walks and martial arts scenes as validation of our approach.</p>
<div class="wp-caption aligncenter" style="width: 514px"><span style="text-decoration: underline;"><img class="   aligncenter" title="Human Video Textures" src="http://www.cc.gatech.edu/cpl/projects/humanvideotextures/graphics/teaser.png" alt="Example Image from Project" width="504" height="156" /> </span><p class="wp-caption-text">Human Video Textures (Output Rendered as a Collage!)</p></div>
]]></content:encoded>
			<wfw:commentRss>http://prof.irfanessa.com/2009/03/01/paper-2009-acm-symposium-on-interactive-human-video-textures/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
<enclosure url="http://www.cc.gatech.edu/cpl/projects/humanvideotextures/hvt-i3d.avi" length="75996996" type="video/x-msvideo" />
		</item>
		<item>
		<title>Paper: ICPR (2008) &#8220;3D Shape Context and Distance Transform for Action Recognition&#8221;</title>
		<link>http://prof.irfanessa.com/2008/12/08/paper-icpr-2008-3d-shape-context-and-distance-transform-for-action-recognition/</link>
		<comments>http://prof.irfanessa.com/2008/12/08/paper-icpr-2008-3d-shape-context-and-distance-transform-for-action-recognition/#comments</comments>
		<pubDate>Mon, 08 Dec 2008 20:22:26 +0000</pubDate>
		<dc:creator>Irfan Essa</dc:creator>
				<category><![CDATA[Activity Recognition]]></category>
		<category><![CDATA[Aware Home]]></category>
		<category><![CDATA[Face and Gesture]]></category>
		<category><![CDATA[Franzi Meier]]></category>
		<category><![CDATA[Matthias Grundmann]]></category>
		<category><![CDATA[PAMI/ICCV/CVPR/ECCV]]></category>
		<category><![CDATA[Papers]]></category>
		<category><![CDATA[2008]]></category>
		<category><![CDATA[Computer Vision]]></category>

		<guid isPermaLink="false">http://academics.irfanessa.com/?p=146</guid>
		<description><![CDATA[M. Grundmann, F. Meier, and I. Essa (2008) &#8220;3D Shape Context and Distance Transform for Action Recognition&#8221;, In Proceedings of International Conference on Pattern Recognition (ICPR) 2008, Tampa, FL. [Project Page &#124; DOI &#124; PDF] ABSTRACT We propose the use of 3D (2D+time) Shape Context to recognize the spatial and temporal details inherent in human [...]]]></description>
			<content:encoded><![CDATA[<p style="text-align: left;">M. Grundmann, F. Meier, and I. Essa (2008) &#8220;3D Shape Context and Distance Transform for Action Recognition&#8221;, In <em>Proceedings of <a href="http://www.icpr2008.org/" target="_blank">International Conference on Pattern Recognition</a></em> (ICPR) 2008, Tampa, FL. [<a href="http://www.mgrundmann.com/icpr2008.html" target="_blank">Project Page</a> | <a href="http://dx.doi.org/10.1109/ICPR.2008.4761435" target="_blank">DOI</a> | <a href="http://www.mgrundmann.com/pdfs/icpr2008.pdf" target="_blank">PDF</a>]</p>
<p style="text-align: center;">ABSTRACT</p>
<p style="text-align: justify;"><a href="http://academics.irfanessa.com/wp-content/uploads/2008/08/3dfigure_feat_small.png"><img class="alignleft size-medium wp-image-163" title="3dfigure_feat_small" src="http://academics.irfanessa.com/wp-content/uploads/2008/08/3dfigure_feat_small-300x179.png" alt="" width="300" height="179" /></a>We propose the use of 3D (2D+time) Shape Context to recognize the spatial and temporal details inherent in human actions. We represent an action in a video sequence by a 3D point cloud extracted by sampling 2D silhouettes over time. A non-uniform sampling method is introduced that gives preference to fast moving body parts using a Euclidean 3D Distance Transform. Actions are then classified by matching the extracted point clouds. Our proposed approach is based on a global matching and does not require specific training to learn the model. We test the approach thoroughly on two publicly available datasets and compare to several state-of-the-art methods. The achieved classification accuracy is on par with or superior to the best results reported to date.</p>
]]></content:encoded>
			<wfw:commentRss>http://prof.irfanessa.com/2008/12/08/paper-icpr-2008-3d-shape-context-and-distance-transform-for-action-recognition/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Paper: ICCV 2007, &#8220;Structure from Statistics &#8211; Unsupervised Activity Analysis using Suffix Trees&#8221;</title>
		<link>http://prof.irfanessa.com/2007/10/15/paper-iccv-2007-structure-from-statistics-unsupervised-activity-analysis-using-suffix-trees/</link>
		<comments>http://prof.irfanessa.com/2007/10/15/paper-iccv-2007-structure-from-statistics-unsupervised-activity-analysis-using-suffix-trees/#comments</comments>
		<pubDate>Mon, 15 Oct 2007 19:56:31 +0000</pubDate>
		<dc:creator>Irfan Essa</dc:creator>
				<category><![CDATA[Aaron Bobick]]></category>
		<category><![CDATA[Activity Recognition]]></category>
		<category><![CDATA[Aware Home]]></category>
		<category><![CDATA[PAMI/ICCV/CVPR/ECCV]]></category>
		<category><![CDATA[Papers]]></category>
		<category><![CDATA[Raffay Hamid]]></category>
		<category><![CDATA[2007]]></category>
		<category><![CDATA[Computer Vision]]></category>

		<guid isPermaLink="false">http://essa.org/irfan/wp/?p=31</guid>
		<description><![CDATA[R. Hamid, S. Maddi, A. Bobick, I. Essa (2007). Structure from Statistics &#8211; Unsupervised Activity Analysis using Suffix Trees, At theInternational Conference on Computer Vision 2007. October 2007, Rio de Janeiro, BRAZIL Abstract Models of activity structure for unconstrained environments are generally not available a priori. Recent representational approaches to this end are limited by [...]]]></description>
			<content:encoded><![CDATA[<ul>
<li> R. Hamid, S. Maddi, A. Bobick, I. Essa (2007). <a href="http://www.cc.gatech.edu/%7Eraffay/hamid_iccv_07.pdf">Structure from Statistics &#8211; Unsupervised Activity Analysis using Suffix Trees, At the</a><a href="http://iccv2007.rutgers.edu/">International Conference on Computer Vision 2007</a>. October 2007, Rio de Janeiro, BRAZIL</li>
</ul>
<p style="text-align: center"><strong>Abstract</strong></p>
<p><a href="http://academics.irfanessa.com/wp-content/uploads/2008/05/iccv07-fig.jpg"><img class="alignleft size-medium wp-image-132" style="float: left; margin: 5px;" title="ICCV07-SuffixTreeFig" src="http://academics.irfanessa.com/wp-content/uploads/2008/05/iccv07-fig-300x168.jpg" alt="" width="300" height="168" /></a>Models of activity structure for unconstrained environments are generally not available a priori. Recent representational approaches to this end are limited by their computational complexity, and ability to capture activity structure only up to some fixed temporal scale. In this work, we propose Suffix Trees as an activity representation to efficiently extract structure of activities by analyzing their constituent event-subsequences over multiple temporal scales. We empirically compare Suffix Trees with some of the previous approaches in terms of feature cardinality, discriminative prowess, noise sensitivity and activity-class discovery. Finally, exploiting properties of Suffix Trees, we present a novel perspective on anomalous subsequences of activities, and propose an algorithm to detect them in linear-time. We present comparative results over experimental data, collected from a kitchen environment to demonstrate the competence of our proposed framework.</p>
]]></content:encoded>
			<wfw:commentRss>http://prof.irfanessa.com/2007/10/15/paper-iccv-2007-structure-from-statistics-unsupervised-activity-analysis-using-suffix-trees/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Paper: IEEE CVPR (2007) &#8220;Tree-based Classifiers for Bilayer Video Segmentation&#8221;</title>
		<link>http://prof.irfanessa.com/2007/06/17/paper-ieee-cvpr-2007-tree-based-classifiers-for-bilayer-video-segmentation/</link>
		<comments>http://prof.irfanessa.com/2007/06/17/paper-ieee-cvpr-2007-tree-based-classifiers-for-bilayer-video-segmentation/#comments</comments>
		<pubDate>Sun, 17 Jun 2007 15:18:24 +0000</pubDate>
		<dc:creator>Irfan Essa</dc:creator>
				<category><![CDATA[Antonio Crimisini]]></category>
		<category><![CDATA[Computational Photography and Video]]></category>
		<category><![CDATA[Funding]]></category>
		<category><![CDATA[John Winn]]></category>
		<category><![CDATA[NSF (0205507)]]></category>
		<category><![CDATA[Numerical Machine Learning]]></category>
		<category><![CDATA[Papers]]></category>
		<category><![CDATA[Pei Yin]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[2007]]></category>
		<category><![CDATA[Computer Vision]]></category>
		<category><![CDATA[CVPR]]></category>

		<guid isPermaLink="false">http://academics.irfanessa.com/2007/06/17/paper-ieee-cvpr-2007-tree-based-classifiers-for-bilayer-video-segmentation/</guid>
		<description><![CDATA[Yin, Pei Criminisi, Antonio Winn, John Essa, Irfan (2007), Tree-based Classifiers for Bilayer Video Segmentation In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2007. CVPR &#8217;07, 17-22 June 2007, page(s): 1 &#8211; 8, Location: Minneapolis, MN, USA, ISBN: 1-4244-1180-7, Digital Object Identifier: 10.1109/CVPR.2007.383008 Abstract This paper presents an algorithm for the automatic segmentation of monocular videos [...]]]></description>
			<content:encoded><![CDATA[<p>Yin, Pei   Criminisi, Antonio   Winn, John   Essa, Irfan (2007), <a href="http://ieeexplore.ieee.org/search/srchabstract.jsp?arnumber=4270033&amp;isnumber=4269956&amp;punumber=4269955&amp;k2dockey=4270033@ieeecnfs&amp;query=%28%28essa%29%3Cin%3Eau+%29&amp;pos=6">Tree-based Classifiers for Bilayer Video Segmentation</a> In Proceedings of <em>IEEE Conference on Computer Vision and Pattern Recognition, 2007. CVPR &#8217;07</em>, 17-22 June 2007, page(s): 1 &#8211; 8, Location: Minneapolis, MN, USA, ISBN: 1-4244-1180-7, Digital Object Identifier: 10.1109/CVPR.2007.383008</p>
<p align="center"><strong>Abstract</strong></p>
<p style="text-align: justify;">This paper presents an algorithm for the automatic segmentation of monocular videos into foreground and background layers. Correct segmentations are produced even in the presence of large background motion with nearly stationary foreground. There are three key contributions. The first is the introduction of a novel motion representation, &#8220;motons&#8221;, inspired by research in object recognition. Second, we propose learning the segmentation likelihood from the spatial context of motion. The learning is efficiently performed by Random Forests. The third contribution is a general taxonomy of tree-based classifiers, which facilitates theoretical and experimental comparisons of several known classification algorithms, as well as spawning new ones. Diverse visual cues such as motion, motion context, colour, contrast and spatial priors are fused together by means of a Conditional Random Field (CRF) model. Segmentation is then achieved by binary min-cut. Our algorithm requires no initialization. Experiments on many video-chat type sequences demonstrate the effectiveness of our algorithm in a variety of scenes. The segmentation results are comparable to those obtained by stereo systems.</p>
]]></content:encoded>
			<wfw:commentRss>http://prof.irfanessa.com/2007/06/17/paper-ieee-cvpr-2007-tree-based-classifiers-for-bilayer-video-segmentation/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Paper: ACM IWVSSN (2006) &#8220;Unsupervised Analysis of Activity Sequences Using Event Motifs&#8221;</title>
		<link>http://prof.irfanessa.com/2006/10/23/paper-acm-iwvssn-2006-unsupervised-analysis-of-activity-sequences-using-event-motifs/</link>
		<comments>http://prof.irfanessa.com/2006/10/23/paper-acm-iwvssn-2006-unsupervised-analysis-of-activity-sequences-using-event-motifs/#comments</comments>
		<pubDate>Mon, 23 Oct 2006 22:59:37 +0000</pubDate>
		<dc:creator>Irfan Essa</dc:creator>
				<category><![CDATA[AAAI/IJCAI/UAI]]></category>
		<category><![CDATA[Aaron Bobick]]></category>
		<category><![CDATA[Activity Recognition]]></category>
		<category><![CDATA[Aware Home]]></category>
		<category><![CDATA[Papers]]></category>
		<category><![CDATA[Raffay Hamid]]></category>
		<category><![CDATA[Siddhartha Maddi]]></category>
		<category><![CDATA[2007]]></category>
		<category><![CDATA[Computer Vision]]></category>

		<guid isPermaLink="false">http://academics.irfanessa.com/2008/01/23/paper-acm-iwvssn-2006-unsupervised-analysis-of-activity-sequences-using-event-motifs/</guid>
		<description><![CDATA[R. Hamid, S. Maddi, A. Bobick, I. Essa. &#8220;Unsupervised Analysis of Activity Sequences Using Event Motifs&#8221;, In proceedings of 4th ACM International Workshop on Video Surveillance and Sensor Networks (in conjunction with ACM Multimedia 2006). Abstract We present an unsupervised framework to discover characterizations of everyday human activities, and demonstrate how such representations can be [...]]]></description>
			<content:encoded><![CDATA[<ul>
<li>R. Hamid, S. Maddi, A. Bobick, I. Essa.  		&#8220;Unsupervised Analysis of Activity Sequences Using Event Motifs&#8221;, In proceedings of  		4th ACM International Workshop on Video Surveillance and Sensor Networks  		(in conjunction with ACM Multimedia 2006).</li>
</ul>
<p style="text-align: center;"><strong>Abstract</strong></p>
<p style="text-align: justify;">We present an unsupervised framework to discover characterizations of everyday human activities, and demonstrate how such representations can be used to extract points of interest in event-streams. We begin with the usage of Suffix Trees as an efficient activity-representation to analyze the global structural information of activities, using their local event statistics over the entire continuum of their temporal resolution. Exploiting this representation, we discover characterizing event-subsequences and present their usage in an ensemble-based framework for activity classification. Finally, we propose a method to automatically detect subsequences of events that are locally atypical in a structural sense. Results over extensive data-sets, collected from multiple sensor-rich environments are presented, to show the competence and scalability of the proposed framework.</p>
]]></content:encoded>
			<wfw:commentRss>http://prof.irfanessa.com/2006/10/23/paper-acm-iwvssn-2006-unsupervised-analysis-of-activity-sequences-using-event-motifs/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Paper: IEEE CVPR (2006) &#8220;Learning Temporal Sequence Model from Partially Labeled Data&#8221;</title>
		<link>http://prof.irfanessa.com/2006/06/14/ieeexplore-learning-temporal-sequence-model-from-partially-labeled-data/</link>
		<comments>http://prof.irfanessa.com/2006/06/14/ieeexplore-learning-temporal-sequence-model-from-partially-labeled-data/#comments</comments>
		<pubDate>Wed, 14 Jun 2006 17:06:45 +0000</pubDate>
		<dc:creator>Irfan Essa</dc:creator>
				<category><![CDATA[Aaron Bobick]]></category>
		<category><![CDATA[Activity Recognition]]></category>
		<category><![CDATA[Aware Home]]></category>
		<category><![CDATA[Papers]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[Yifan Shi]]></category>
		<category><![CDATA[2006]]></category>
		<category><![CDATA[Computer Vision]]></category>

		<guid isPermaLink="false">http://academics.irfanessa.com/2006/06/14/ieeexplore-learning-temporal-sequence-model-from-partially-labeled-data/</guid>
		<description><![CDATA[Yifan Shi, Bobick, A. Essa, I. (2006), &#8220;Learning Temporal Sequence Model from Partially Labeled Data&#8221; Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2006 Volume: 2, page(s): 1631 &#8211; 1638, ISSN: 1063-6919, ISBN: 0-7695-2597-0, Digital Object Identifier: 10.1109/CVPR.2006.174 [IEEEXplore] Abstract Graphical models are often used to represent and recognize activities. Purely [...]]]></description>
			<content:encoded><![CDATA[<p><strong></strong><a href="http://ieeexplore.ieee.org/search/srchabstract.jsp?arnumber=1640951&amp;isnumber=34374&amp;punumber=10924&amp;k2dockey=1640951@ieeecnfs&amp;query=%28%28essa%29%3Cin%3Eau+%29&amp;pos=15"></a>Yifan Shi, Bobick, A.   Essa, I. (2006), &#8220;<strong>Learning Temporal Sequence Model from Partially Labeled Data&#8221;</strong> Proceedings of <em>IEEE Computer Society Conference on Computer Vision and Pattern Recognition</em>, 2006<br />
Volume: 2, page(s): 1631 &#8211; 1638, ISSN: 1063-6919, ISBN: 0-7695-2597-0, Digital Object Identifier: 10.1109/CVPR.2006.174 <a href="http://ieeexplore.ieee.org/search/srchabstract.jsp?arnumber=1640951&amp;isnumber=34374&amp;punumber=10924&amp;k2dockey=1640951@ieeecnfs&amp;query=%28%28essa%29%3Cin%3Eau+%29&amp;pos=15">[IEEEXplore]</a></p>
<p align="center"><strong>Abstract</strong></p>
<p style="text-align: justify;">Graphical models are often used to represent and recognize activities. Purely unsupervised methods (such as HMMs) can be trained automatically but yield models whose internal structure &#8211; the nodes &#8211; are difficult to interpret semantically. Manually constructed networks typically have nodes corresponding to sub-events, but the programming and training of these networks is tedious and requires extensive domain expertise. In this paper, we propose a semi-supervised approach in which a manually structured, Propagation Network (a form of a DBN) is initialized from a small amount of fully annotated data, and then refined by an EM-based learning method in an unsupervised fashion. During node refinement (the M step) a boosting-based algorithm is employed to train the evidence detectors of individual nodes. Experiments on a variety of data types &#8211; vision and inertial measurements &#8211; in several tasks demonstrate the ability to learn from as little as one fully annotated example accompanied by a small number of positive but non-annotated training examples. The system is applied to both recognition and anomaly detection tasks.</p>
]]></content:encoded>
			<wfw:commentRss>http://prof.irfanessa.com/2006/06/14/ieeexplore-learning-temporal-sequence-model-from-partially-labeled-data/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Paper: IEEE CVPR (2005) &#8220;Tracking multiple objects through occlusions&#8221;</title>
		<link>http://prof.irfanessa.com/2005/06/20/paper-ieee-cvpr-2005-tracking-multiple-objects-through-occlusions/</link>
		<comments>http://prof.irfanessa.com/2005/06/20/paper-ieee-cvpr-2005-tracking-multiple-objects-through-occlusions/#comments</comments>
		<pubDate>Mon, 20 Jun 2005 17:13:50 +0000</pubDate>
		<dc:creator>Irfan Essa</dc:creator>
				<category><![CDATA[Activity Recognition]]></category>
		<category><![CDATA[Aware Home]]></category>
		<category><![CDATA[PAMI/ICCV/CVPR/ECCV]]></category>
		<category><![CDATA[Papers]]></category>
		<category><![CDATA[Yan Huang]]></category>
		<category><![CDATA[2005]]></category>
		<category><![CDATA[Computer Vision]]></category>

		<guid isPermaLink="false">http://academics.irfanessa.com/2005/06/20/paper-ieee-cvpr-2005-tracking-multiple-objects-through-occlusions/</guid>
		<description><![CDATA[Huang, Y and Essa, I. (2005) &#8220;Tracking multiple objects through occlusions&#8221;,  In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005 (CVPR 2005), Volume: 2 page(s): 1051 &#8211; 1058 vol. 2, ISSN: 1063-6919, ISBN: 0-7695-2372-2, INSPEC Accession Number:8633324 DOI: 10.1109/CVPR.2005.350, [IEEEXplore#] 20-25 June 2005 ABSTRACT We present an approach for tracking [...]]]></description>
			<content:encoded><![CDATA[<p>Huang, Y and Essa, I. (2005) &#8220;Tracking multiple objects through occlusions&#8221;,  In Proceedings of IEEE Computer Society Conference on <em>Computer Vision and Pattern Recognition</em>, 2005 (CVPR 2005), Volume: 2 page(s): 1051 &#8211; 1058 vol. 2, ISSN: 1063-6919, ISBN: 0-7695-2372-2, INSPEC Accession Number:8633324 DOI: 10.1109/CVPR.2005.350<a href="http://ieeexplore.ieee.org/search/srchabstract.jsp?arnumber=1467559&amp;isnumber=31473&amp;punumber=9901&amp;k2dockey=1467559@ieeecnfs&amp;query=%28%28essa%29%3Cin%3Eau+%29&amp;pos=18">, [IEEEXplore#]</a> 20-25 June 2005</p>
<p align="center"><strong>ABSTRACT</strong></p>
<p style="text-align: justify;">We present an approach for tracking varying number of objects through both temporally and spatially significant occlusions. Our method builds on the idea of object permanence to reason about occlusions. To this end, tracking is performed at both the region level and the object level. At the region level, a customized genetic algorithm is used to search for optimal region tracks. This limits the scope of object trajectories. At the object level, each object is located based on adaptive appearance models, spatial distributions and inter-occlusion relationships. The proposed architecture is capable of tracking objects even in the presence of long periods of full occlusions. We demonstrate the viability of this approach by experimenting on several videos of a user interacting with a variety of objects on a desktop.</p>
]]></content:encoded>
			<wfw:commentRss>http://prof.irfanessa.com/2005/06/20/paper-ieee-cvpr-2005-tracking-multiple-objects-through-occlusions/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Talk at USC&#8217;s IRIS (2004): &#8220;Temporal Reasoning from Video to Temporal Synthesis of Video&#8221;</title>
		<link>http://prof.irfanessa.com/2004/10/30/talk-at-uscs-iris-2004/</link>
		<comments>http://prof.irfanessa.com/2004/10/30/talk-at-uscs-iris-2004/#comments</comments>
		<pubDate>Sun, 31 Oct 2004 01:09:39 +0000</pubDate>
		<dc:creator>Irfan Essa</dc:creator>
				<category><![CDATA[Activity Recognition]]></category>
		<category><![CDATA[Aware Home]]></category>
		<category><![CDATA[Computational Photography and Video]]></category>
		<category><![CDATA[Presentations]]></category>
		<category><![CDATA[2004]]></category>
		<category><![CDATA[Computer Vision]]></category>

		<guid isPermaLink="false">http://irfan.essa.org/wp/2004/10/30/talk-at-uscs-iris-2004/</guid>
		<description><![CDATA[Irfan Essa (2004), &#8220;Temporal Reasoning from Video to Temporal Synthesis of Video&#8221; Talk at USC&#8217;s IRIS-Vision Seminars (Fall 2004). Temporal Reasoning from Video to Temporal Synthesis of Video Abstract In this talk, I will present some ongoing work on extracting spatio-temporal cues from video for both synthesis of novel video sequences, and recognition of complex [...]]]></description>
			<content:encoded><![CDATA[<ul>
<li>Irfan Essa (2004), &#8220;Temporal Reasoning from Video to Temporal Synthesis of Video&#8221;<a href="http://iris.usc.edu/Information/seminars/essa.html"> Talk at USC&#8217;s IRIS-Vision Seminars (Fall 2004).</a></li>
</ul>
<p align="center"><strong>Temporal Reasoning from Video to Temporal Synthesis of Video</strong></p>
<p align="center">Abstract</p>
<p style="text-align: justify;">In this talk, I will present some ongoing work on extracting spatio-temporal cues from video for both synthesis of novel video sequences, and recognition of complex activities. I will start off with some of our earlier work on Video Textures, where repeating information is extracted to generate extended sequences of videos. I will then describe some of our extensions to this approach that allow for controlled generation of animations of video sprites. We have developed various learning and optimization techniques that allow for video-based animations of photo-realistic characters. Then I will describe our new approach for image and video synthesis that builds on optimal patch-based copying of samples. I will show how our method allows for iterative refinement and extends to synthesis of both images and video from very limited samples. In the next part of my talk, I will describe how a similar analysis of video can be used to recognize what a person is doing in a scene. Such an analysis of video, aimed at recognition, requires more contextual information about the environment. I will show how we leverage contextual information shared between actions and objects to recognize what is happening in complex environments. I will also show that by adding some form of grammar (we use Stochastic Context Free Grammar) we can recognize very complex, multi-tasked activities.</p>
<p style="text-align: justify;">If time permits, I will describe (very briefly) the Aware Home project at Georgia Tech, which is one primary area of ongoing and future research for me and my group. Further information on my work with videos is available from my webpage at <a href="http://www.cc.gatech.edu/%7Eirfan">http://www.cc.gatech.edu/~irfan</a></p>
]]></content:encoded>
			<wfw:commentRss>http://prof.irfanessa.com/2004/10/30/talk-at-uscs-iris-2004/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Paper: IEEE CVPR (2004) &#8220;Asymmetrically boosted HMM for speech reading&#8221;</title>
		<link>http://prof.irfanessa.com/2004/06/02/ieeexplore-asymmetrically-boosted-hmm-for-speech-reading/</link>
		<comments>http://prof.irfanessa.com/2004/06/02/ieeexplore-asymmetrically-boosted-hmm-for-speech-reading/#comments</comments>
		<pubDate>Wed, 02 Jun 2004 22:46:44 +0000</pubDate>
		<dc:creator>Irfan Essa</dc:creator>
				<category><![CDATA[Funding]]></category>
		<category><![CDATA[James Rehg]]></category>
		<category><![CDATA[NSF (0205507)]]></category>
		<category><![CDATA[Papers]]></category>
		<category><![CDATA[Pei Yin]]></category>
		<category><![CDATA[2004]]></category>
		<category><![CDATA[Computer Vision]]></category>
		<category><![CDATA[CVPR]]></category>
		<category><![CDATA[Faces]]></category>
		<category><![CDATA[NSF]]></category>

		<guid isPermaLink="false">http://academics.irfanessa.com/2004/06/02/ieeexplore-asymmetrically-boosted-hmm-for-speech-reading/</guid>
		<description><![CDATA[Pei Yin Essa, I. Rehg, J.M. (2004) &#8220;Asymmetrically boosted HMM for speech reading,&#8221;, In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004 (CVPR 2004). Publication Date: 27 June-2 July 2004, Volume: 2, On page(s): II-755 &#8211; II-761 Vol.2 ISSN: 1063-6919, ISBN: 0-7695-2158-, INSPEC Accession Number:8161546, Digital Object Identifier: 10.1109/CVPR.2004.1315240 [...]]]></description>
			<content:encoded><![CDATA[<p>Pei Yin   Essa, I.   Rehg, J.M. (2004) &#8220;<a href="http://ieeexplore.ieee.org/search/srchabstract.jsp?arnumber=1315240&amp;isnumber=29134&amp;punumber=9183&amp;k2dockey=1315240@ieeecnfs&amp;query=%28%28essa%29%3Cin%3Eau+%29&amp;pos=22">Asymmetrically boosted HMM for speech reading</a>,&#8221;, In <em>Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004 (CVPR 2004)</em>. Publication Date: 27 June-2 July 2004, Volume: 2, On page(s): II-755 &#8211; II-761 Vol.2 ISSN: 1063-6919, ISBN: 0-7695-2158-, INSPEC Accession Number:8161546, Digital Object Identifier: 10.1109/CVPR.2004.1315240</p>
<p align="center"><strong>Abstract</strong></p>
<p style="text-align: justify;">Speech reading, also known as lip reading, is aimed at extracting visual cues of lip and facial movements to aid in recognition of speech. The main hurdle for speech reading is that visual measurements of lip and facial motion lack information-rich features like the Mel frequency cepstral coefficients (MFCC), widely used in acoustic speech recognition. These MFCC are used with hidden Markov models (HMM) in most speech recognition systems at present. Speech reading could greatly benefit from automatic selection and formation of informative features from measurements in the visual domain. These new features can then be used with HMM to capture the dynamics of lip movement and eventual recognition of lip shapes. Towards this end, we use AdaBoost methods for automatic visual feature formation. Specifically, we design an asymmetric variant of AdaBoost M2 algorithm to deal with the ill-posed multi-class sample distribution inherent in our problem. Our experiments show that the boosted HMM approach outperforms conventional AdaBoost and HMM classifiers. Our primary contributions are in the design of (a) boosted HMM and (b) asymmetric multi-class boosting.</p>
]]></content:encoded>
			<wfw:commentRss>http://prof.irfanessa.com/2004/06/02/ieeexplore-asymmetrically-boosted-hmm-for-speech-reading/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Paper: IEEE CVPR (2004) &#8220;Propagation networks for recognition of partially ordered sequential action&#8221;</title>
		<link>http://prof.irfanessa.com/2004/06/02/ieeexplore-propagation-networks-for-recognition-of-partially-ordered-sequential-action/</link>
		<comments>http://prof.irfanessa.com/2004/06/02/ieeexplore-propagation-networks-for-recognition-of-partially-ordered-sequential-action/#comments</comments>
		<pubDate>Wed, 02 Jun 2004 22:44:31 +0000</pubDate>
		<dc:creator>Irfan Essa</dc:creator>
				<category><![CDATA[Aaron Bobick]]></category>
		<category><![CDATA[Activity Recognition]]></category>
		<category><![CDATA[Aware Home]]></category>
		<category><![CDATA[David Minnen]]></category>
		<category><![CDATA[Papers]]></category>
		<category><![CDATA[Yan Huang]]></category>
		<category><![CDATA[Yifan Shi]]></category>
		<category><![CDATA[2004]]></category>
		<category><![CDATA[Computer Vision]]></category>
		<category><![CDATA[DVFX]]></category>

		<guid isPermaLink="false">http://academics.irfanessa.com/2004/06/02/ieeexplore-propagation-networks-for-recognition-of-partially-ordered-sequential-action/</guid>
		<description><![CDATA[Yifan Shi, Yan Huang, Minnen, D., Bobick, A., Essa, I. (2004), &#8220;Propagation networks for recognition of partially ordered sequential action&#8221; In Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004 (CVPR 2004). Volume: 2, page(s): II-862 &#8211; II-869 Vol.2, ISSN: 1063-6919, ISBN: 0-7695-2158-4, INSPEC Accession Number:8161557, Digital Object Identifier: [...]]]></description>
			<content:encoded><![CDATA[<p>Yifan Shi, Yan Huang,   Minnen, D.,   Bobick, A.,   Essa, I. (2004), &#8220;Propagation networks for recognition of partially ordered sequential action&#8221; In Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004 (CVPR 2004). Volume: 2, page(s): II-862 &#8211; II-869 Vol.2, ISSN: 1063-6919, ISBN: 0-7695-2158-4, INSPEC Accession Number:8161557, Digital Object Identifier: 10.1109/CVPR.2004.1315255, 27 June-2 July 2004<a href="http://ieeexplore.ieee.org/search/srchabstract.jsp?arnumber=1315255&amp;isnumber=29134&amp;punumber=9183&amp;k2dockey=1315255@ieeecnfs&amp;query=%28%28essa%29%3Cin%3Eau+%29&amp;pos=21"> (IEEEXplore)</a></p>
<p align="center"><strong>Abstract</strong></p>
<p style="text-align: justify;">We present propagation networks (P-nets), a novel approach for representing and recognizing sequential activities that include parallel streams of action. We represent each activity using partially ordered intervals. Each interval is restricted by both temporal and logical constraints, including information about its duration and its temporal relationship with other intervals. P-nets associate one node with each temporal interval. Each node is triggered according to a probability density function that depends on the state of its parent nodes. Each node also has an associated observation function that characterizes supporting perceptual evidence. To facilitate real-time analysis, we introduce a particle filter framework to explore the conditional state space. We modify the original condensation algorithm to more efficiently sample a discrete state space (D-condensation). Experiments in the domain of blood glucose monitor calibration demonstrate both the representational power of P-nets and the effectiveness of the D-condensation algorithm.</p>
]]></content:encoded>
			<wfw:commentRss>http://prof.irfanessa.com/2004/06/02/ieeexplore-propagation-networks-for-recognition-of-partially-ordered-sequential-action/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Thesis: Gabriel Brostow&#8217;s PhD (2004): &#8220;Novel Skeletal Representation for Articulated Creatures&#8221;</title>
		<link>http://prof.irfanessa.com/2004/04/09/gabriel-brostows-phd-thesis-2004-novel-skeletal-representation-for-articulated-creatures/</link>
		<comments>http://prof.irfanessa.com/2004/04/09/gabriel-brostows-phd-thesis-2004-novel-skeletal-representation-for-articulated-creatures/#comments</comments>
		<pubDate>Fri, 09 Apr 2004 16:17:01 +0000</pubDate>
		<dc:creator>Irfan Essa</dc:creator>
				<category><![CDATA[Activity Recognition]]></category>
		<category><![CDATA[Gabriel Brostow]]></category>
		<category><![CDATA[Modeling and Animation]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[Thesis]]></category>
		<category><![CDATA[Animation]]></category>
		<category><![CDATA[Computer Vision]]></category>

		<guid isPermaLink="false">http://essa.org/irfan/wp/?p=16</guid>
		<description><![CDATA[<p>We define a Spine as a branching axial structure representing the shape and topology of a 3D objects limbs, and capturing the limbs correspondence and motion over time. ... In general, our approach combines the objectives of generalized cylinders, 3D scanning, and markerless motion capture to generate baseline models from real puppets, animals, and human subjects.</p>
]]></description>
			<content:encoded><![CDATA[<p><a href="http://mi.eng.cam.ac.uk/~gjb47/" target="_blank">Gabriel Brostow</a> (2004), <a href="http://smartech.gatech.edu/handle/1853/5236">&#8220;Novel Skeletal Representation for Articulated Creatures&#8221;</a> PhD Thesis, Georgia Institute of Technology, College of Computing. (Advisor: Irfan Essa) [<a href="http://www.cc.gatech.edu/cpl/projects/spines/Thesis/Gabriel_Brostow_PhDThesis.pdf">PDF</a>] [<a href="http://hdl.handle.net/1853/5236" target="_blank">URI</a>]<strong>Abstract</strong>This research examines an approach for capturing 3D surface and structural data of moving articulated creatures. Given the task of non-invasively and automatically capturing such data, a methodology <a href="http://www.cc.gatech.edu/cpl/projects/spines/"><img src="http://www-static.cc.gatech.edu/%7Ebrostow/wwwHelpers/index.3.jpg" border="0" alt="" width="256" height="425" align="left" /></a>and the associated experiments are presented, that apply to multiview videos of the subjects motion. Our thesis states: A functional structure and the timevarying surface of an articulated creature subject are contained in a sequence of its 3D data. A functional structure is one example of the possible arrangements of internal mechanisms (kinematic joints, springs, etc.) that is capable of performing the motions observed in the input data. Volumetric structures are frequently used as shape descriptors for 3D data. The capture of such data is being facilitated by developments in multi-view video and range scanning, extending to subjects that are alive and moving. In this research, we examine vision-based modeling and the related representation of moving articulated creatures using Spines. We define a Spine as a branching axial structure representing the shape and topology of a 3D objects limbs, and capturing the limbs correspondence and motion over time. The Spine concept builds on skeletal representations often used to describe the internal structure of an articulated object and the significant protrusions. Our representation of a Spine provides for enhancements over a 3D skeleton. These enhancements form temporally consistent limb hierarchies that contain correspondence information about real motion data. We present a practical implementation that approximates a Spines joint probability function to reconstruct Spines for synthetic and real subjects that move. In general, our approach combines the objectives of generalized cylinders, 3D scanning, and markerless motion capture to generate baseline models from real puppets, animals, and human subjects.</p>
]]></content:encoded>
			<wfw:commentRss>http://prof.irfanessa.com/2004/04/09/gabriel-brostows-phd-thesis-2004-novel-skeletal-representation-for-articulated-creatures/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Paper: ICCV (2003) &#8220;Spectral partitioning for structure from motion&#8221;</title>
		<link>http://prof.irfanessa.com/2003/10/13/paper-iccv-2003-spectral-partitioning-for-structure-from-motion/</link>
		<comments>http://prof.irfanessa.com/2003/10/13/paper-iccv-2003-spectral-partitioning-for-structure-from-motion/#comments</comments>
		<pubDate>Mon, 13 Oct 2003 14:29:13 +0000</pubDate>
		<dc:creator>Irfan Essa</dc:creator>
				<category><![CDATA[Computational Photography and Video]]></category>
		<category><![CDATA[Drew Steedly]]></category>
		<category><![CDATA[Frank Dellaert]]></category>
		<category><![CDATA[PAMI/ICCV/CVPR/ECCV]]></category>
		<category><![CDATA[2003]]></category>
		<category><![CDATA[Computer Vision]]></category>
		<category><![CDATA[Structure from Motion]]></category>

		<guid isPermaLink="false">http://academics.irfanessa.com/?p=234</guid>
		<description><![CDATA[Steedly, D., Essa, I., Dellaert, F. (2003), &#8220;Spectral partitioning for structure from motion&#8221;, In Proceedings. Ninth IEEE International Conference on Computer Vision, 2003, 13-16 Oct. 2003, page(s): 996 &#8211; 1003 vol.2, Nice, France, ISBN: 0-7695-1950-4, INSPEC Accession Number:7971018, Digital Object Identifier: 10.1109/ICCV.2003.1238457, [IEEEXplore#] Abstract We propose a spectral partitioning approach for large-scale optimization problems, specifically [...]]]></description>
			<content:encoded><![CDATA[<p>Steedly, D., Essa, I., Dellaert, F. (2003), &#8220;Spectral partitioning for structure from motion&#8221;, In<em> Proceedings. Ninth IEEE International Conference on Computer Vision, 2003</em>, 13-16 Oct. 2003, page(s): 996 &#8211; 1003 vol.2, Nice, France, ISBN: 0-7695-1950-4, INSPEC Accession Number:7971018, Digital Object Identifier: 10.1109/ICCV.2003.1238457, [<a href="http://ieeexplore.ieee.org/search/wrapper.jsp?arnumber=1238457" target="_blank">IEEEXplore#</a>]</p>
<p style="text-align: center;">
<strong>Abstract</strong></p>
<p style="text-align: justify;">
We propose a spectral partitioning approach for large-scale optimization problems, specifically structure from motion. In structure from motion, partitioning methods reduce the problem into smaller and better conditioned subproblems which can be efficiently optimized. Our partitioning method uses only the Hessian of the reprojection error and its eigenvector. We show that partitioned systems that preserve the eigenvectors corresponding to small eigenvalues result in lower residual error when optimized. We create partitions by clustering the entries of the eigenvectors of the Hessian corresponding to small eigenvalues. This is a more general technique than relying on domain knowledge and heuristics such as bottom-up structure from motion approaches. Simultaneously, it takes advantage of more information than generic matrix partitioning algorithms.</p>
]]></content:encoded>
			<wfw:commentRss>http://prof.irfanessa.com/2003/10/13/paper-iccv-2003-spectral-partitioning-for-structure-from-motion/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Papers: ACM SIGGRAPH (2003) &#8220;Graphcut textures&#8221;</title>
		<link>http://prof.irfanessa.com/2003/07/25/graphcut-textures/</link>
		<comments>http://prof.irfanessa.com/2003/07/25/graphcut-textures/#comments</comments>
		<pubDate>Sat, 26 Jul 2003 01:29:52 +0000</pubDate>
		<dc:creator>Irfan Essa</dc:creator>
				<category><![CDATA[ACM SIGGRAPH]]></category>
		<category><![CDATA[Aaron Bobick]]></category>
		<category><![CDATA[Arno Schödl]]></category>
		<category><![CDATA[Computational Photography and Video]]></category>
		<category><![CDATA[Greg Turk]]></category>
		<category><![CDATA[Papers]]></category>
		<category><![CDATA[Vivek Kwatra]]></category>
		<category><![CDATA[2003]]></category>
		<category><![CDATA[Computer Vision]]></category>
		<category><![CDATA[SIGGRAPH]]></category>
		<category><![CDATA[Texture Synthesis]]></category>
		<category><![CDATA[Video Textures]]></category>

		<guid isPermaLink="false">http://academics.irfanessa.com/2003/07/25/graphcut-textures/</guid>
		<description><![CDATA[Vivek Kwatra, Arno Schödl, Irfan Essa, Greg Turk, Aaron Bobick (2003), &#8220;Graphcut textures: image and video synthesis using graph cuts&#8221; In ACM Transactions on Graphics (TOG), Volume 22 , Issue 3, Proceedings of ACM SIGGRAPH 2003, Pages: 277 &#8211; 286, July 2003, ISSN:0730-0301. (DOI&#124;Paper&#124; SIGGRAPH Video (160 MB, 50 MB) &#124; Video Results 87 MB [...]]]></description>
			<content:encoded><![CDATA[<p>Vivek Kwatra, Arno Schödl, Irfan Essa, Greg Turk, Aaron Bobick (2003), &#8220;<a href="http://portal.acm.org/citation.cfm?id=882264&amp;dl=ACM&amp;coll=ACM&amp;CFID=63156436&amp;CFTOKEN=24591103">Graphcut textures</a>: image and video synthesis using graph cuts&#8221; In ACM Transactions on Graphics (TOG), Volume 22 ,  Issue 3, Proceedings of ACM SIGGRAPH 2003, Pages: 277 &#8211; 286, July 2003, ISSN:0730-0301. (<a href="http://doi.acm.org/10.1145/882262.882264" target="_blank">DOI</a>|<a href="http://www-static.cc.gatech.edu/gvu/perception/projects/graphcuttextures/gc-final.pdf" target="_blank">Paper</a>|<span style="color: #ccccff;"> </span>SIGGRAPH Video (<a href="http://www-static.cc.gatech.edu/gvu/perception/projects/graphcuttextures/2003_Graphcut_DVD.mpg">160 MB</a>, <a href="http://www-static.cc.gatech.edu/gvu/perception/projects/graphcuttextures/2003_Graphcut_DVD_Jerky.mpg">50 MB</a>)  | <a href="http://www-static.cc.gatech.edu/gvu/perception/projects/graphcuttextures/VideoResults.mpg">Video Results 87 MB</a> | <a href="http://www-static.cc.gatech.edu/gvu/perception/projects/graphcuttextures/" target="_blank">Project Site</a>)</p>
<p align="center"><strong>ABSTRACT</strong></p>
<p>In this paper we introduce a new algorithm for image and video texture synthesis. In our approach, patch regions from a sample image or video are transformed and copied to the output and then stitched together along optimal seams to generate a new (and typically larger) output. In contrast to other techniques, the size of the <a title="GC-TOC" href="http://academics.irfanessa.com/wp-content/uploads/2008/04/gc-vtoc.jpg"><img src="http://academics.irfanessa.com/wp-content/uploads/2008/04/gc-vtoc.jpg" alt="GC-TOC" hspace="5" vspace="5" align="left" /></a>patch is not chosen a-priori, but instead a graph cut technique is used to determine the optimal patch region for any given offset between the input and output texture. Unlike dynamic programming, our graph cut technique for seam optimization is applicable in any dimension. We specifically explore it in 2D and 3D to perform video texture synthesis in addition to regular image synthesis. We present approximative offset search techniques that work well in conjunction with the presented patch size optimization. We show results for synthesizing regular, random, and natural images and videos. We also demonstrate how this method can be used to interactively merge different images to generate new scenes.</p>
]]></content:encoded>
			<wfw:commentRss>http://prof.irfanessa.com/2003/07/25/graphcut-textures/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
<enclosure url="http://www-static.cc.gatech.edu/gvu/perception/projects/graphcuttextures/2003_Graphcut_DVD_Jerky.mpg" length="51089412" type="video/mpeg" />
<enclosure url="http://www-static.cc.gatech.edu/gvu/perception/projects/graphcuttextures/VideoResults.mpg" length="89126156" type="video/mpeg" />
		</item>
		<item>
		<title>Funding: NSF/ITR (2002) &#8220;Analysis of Complex Audio-Visual Events Using Spatially Distributed Sensors&#8221;</title>
		<link>http://prof.irfanessa.com/2002/10/01/funding-nsfitr-2002-analysis-of-complex-audio-visual-events-using-spatially-distributed-sensors/</link>
		<comments>http://prof.irfanessa.com/2002/10/01/funding-nsfitr-2002-analysis-of-complex-audio-visual-events-using-spatially-distributed-sensors/#comments</comments>
		<pubDate>Tue, 01 Oct 2002 14:56:34 +0000</pubDate>
		<dc:creator>Irfan Essa</dc:creator>
				<category><![CDATA[Funding]]></category>
		<category><![CDATA[James Rehg]]></category>
		<category><![CDATA[NSF (0205507)]]></category>
		<category><![CDATA[2002]]></category>
		<category><![CDATA[Audio Analysis]]></category>
		<category><![CDATA[Computer Vision]]></category>
		<category><![CDATA[NSF]]></category>

		<guid isPermaLink="false">http://academics.irfanessa.com/2002/10/01/funding-nsfitr-2002-analysis-of-complex-audio-visual-events-using-spatially-distributed-sensors/</guid>
		<description><![CDATA[Award#0205507 &#8211; ITR: Analysis of Complex Audio-Visual Events Using Spatially Distributed Sensors ABSTRACT We propose to develop a comprehensive framework for the joint analysis of audio-visual signals obtained from spatially distributed microphones and cameras. We desire solutions to the audio-visual sensing problem that will scale to an arbitrary number of cameras and microphones and can [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://nsf.gov/awardsearch/showAward.do?AwardNumber=0205507">Award#0205507 &#8211; ITR: Analysis of Complex Audio-Visual Events Using Spatially Distributed Sensors</a></p>
<p style="text-align: center;"><strong>ABSTRACT</strong></p>
<p style="text-align: justify;">We propose to develop a comprehensive framework for the joint analysis of audio-visual signals obtained from spatially distributed microphones and cameras. We desire solutions to the audio-visual sensing problem that will scale to an arbitrary number of cameras and microphones and can address challenging environments in which there are multiple speech and nonspeech sound sources and multiple moving people and objects. Recently it has become relatively inexpensive to deploy tens or even hundreds of cameras and microphones in an environment. Many applications could benefit from ability to sense in both modalities. There are two levels at which joint audio-visual analysis can take place. At the signal level, the challenge is to develop representations that capture the rich dependency structure in the joint signal and deal success-fully issues such as variable sampling rates and varying temporal delays between cues. At the spatial level the challenge is to compensate for the distortions introduced by the sensor location and pool information across sensors to recover 3-D information about the spatial environment. For many applications, it is highly desirable if the solution method is self-calibrating, and does not require an extensive manual calibration process every time a new sensor is added or an old sensor is moved or replaced. Removing the burden of manual calibration also makes it possible to exploit ad hoc sensor networks which could arise, for example, from wearable microphones and cameras. We propose to address the following four research topics: 1. Representations and learning methods for signal level fusion. 2. Volumetric techniques for fusing spatially distributed audio-visual data. 3. Self-calibration of distributed microphone-camera systems 4. Applications of audio-visual sensing. For example, this proposal includes considerable work on lip and facial analysis to improve voice communications.</p>
]]></content:encoded>
			<wfw:commentRss>http://prof.irfanessa.com/2002/10/01/funding-nsfitr-2002-analysis-of-complex-audio-visual-events-using-spatially-distributed-sensors/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Paper AAAI (2002): &#8220;Recognizing Multitasked Activities from Video using Stochastic Context-Free Grammar&#8221;</title>
		<link>http://prof.irfanessa.com/2002/09/29/paper-aaai-2002-recognizing-multitasked-activities-from-video-using-stochastic-context-free-grammar/</link>
		<comments>http://prof.irfanessa.com/2002/09/29/paper-aaai-2002-recognizing-multitasked-activities-from-video-using-stochastic-context-free-grammar/#comments</comments>
		<pubDate>Sun, 29 Sep 2002 15:13:50 +0000</pubDate>
		<dc:creator>Irfan Essa</dc:creator>
				<category><![CDATA[AAAI/IJCAI/UAI]]></category>
		<category><![CDATA[Activity Recognition]]></category>
		<category><![CDATA[Darnell Moore]]></category>
		<category><![CDATA[Intelligent Environments]]></category>
		<category><![CDATA[Papers]]></category>
		<category><![CDATA[2002]]></category>
		<category><![CDATA[AI]]></category>
		<category><![CDATA[Computer Vision]]></category>

		<guid isPermaLink="false">http://prof.irfanessa.com/?p=631</guid>
		<description><![CDATA[D. Moore and I. Essa (2002). &#8220;Recognizing multitasked activities from video using stochastic context-free grammar&#8221;, in Proceedings of AAAI 2002. [PDF &#124; Project Site] Abstract In this paper, we present techniques for recognizing com- plex, multitasked activities from video. Visual information like image features and motion appearances, combined with domain-specific information, like object context is [...]]]></description>
			<content:encoded><![CDATA[<p>D. Moore and I. Essa (2002). &#8220;Recognizing multitasked activities from video using stochastic context-free grammar&#8221;, in Proceedings of AAAI 2002. [<a href="http://www.aaai.org/Papers/AAAI/2002/AAAI02-116.pdf">PDF</a> | <a href="http://www.cc.gatech.edu/cpl/projects/objectspaces/" target="_blank">Project Site</a>]</p>
<p style="text-align: center;"><strong>Abstract</strong></p>
<p style="text-align: justify;"><strong> </strong>In this paper, we present techniques for recognizing com- plex, multitasked activities from video. Visual information like image features and motion appearances, combined with domain-specific information, like object context is used ini- tially to label events. Each action event is represented with a unique symbol, allowing for a sequence of interactions to be described as an ordered symbolic string. Then, a model of stochastic context-free grammar (SCFG), which is devel- oped using underlying rules of an activity, is used to provide the structure for recognizing semantically meaningful behav- ior over extended periods. Symbolic strings are parsed us- ing the Earley-Stolcke algorithm to determine the most likely semantic derivation for recognition. Parsing substrings al- lows us to recognize patterns that describe high-level, com- plex events taking place over segments of the video sequence. We introduce new parsing strategies to enable error detection and recovery in stochastic context-free grammar and meth- ods of quantifying group and individual behavior in activities with separable roles. We show through experiments, with a popular card game, the recognition of high-level narratives of multi-player games and the identification of player strate- gies and behavior using computer vision.</p>
<p style="text-align: justify;">
<div class="wp-caption aligncenter" style="width: 396px"><a href="http://lh3.ggpht.com/_ukXHDWz1Yr0/SujBSmeQr9I/AAAAAAAA2YI/5Lp-GeSp28Q/OS-bjack.jpg"><img class=" " title="Recognizing Black Jack" src="http://lh3.ggpht.com/_ukXHDWz1Yr0/SujBSmeQr9I/AAAAAAAA2YI/5Lp-GeSp28Q/OS-bjack.jpg" alt="Recognizing Black Jack" width="386" height="183" /></a><p class="wp-caption-text">Recognizing Black Jack</p></div>
]]></content:encoded>
			<wfw:commentRss>http://prof.irfanessa.com/2002/09/29/paper-aaai-2002-recognizing-multitasked-activities-from-video-using-stochastic-context-free-grammar/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Paper:ICPR (2002) &#8220;Learning video processing by example&#8221;</title>
		<link>http://prof.irfanessa.com/2002/08/11/ieeexplore-learning-video-processing-by-example/</link>
		<comments>http://prof.irfanessa.com/2002/08/11/ieeexplore-learning-video-processing-by-example/#comments</comments>
		<pubDate>Sun, 11 Aug 2002 19:33:08 +0000</pubDate>
		<dc:creator>Irfan Essa</dc:creator>
				<category><![CDATA[Antonio Haro]]></category>
		<category><![CDATA[Collaborators]]></category>
		<category><![CDATA[Computational Photography and Video]]></category>
		<category><![CDATA[Numerical Machine Learning]]></category>
		<category><![CDATA[PAMI/ICCV/CVPR/ECCV]]></category>
		<category><![CDATA[2002]]></category>
		<category><![CDATA[Computer Vision]]></category>

		<guid isPermaLink="false">http://academics.irfanessa.com/?p=267</guid>
		<description><![CDATA[Haro, A. Essa, I. (2002), &#8220;Learning video processing by example&#8221; In Proceedings of 16th International Conference on Pattern Recognition, 2002, 11-15 Aug. 2002 Volume: 1, page(s): 487 &#8211; 491 vol.1, Number of Pages: 4 vol.(xxix 834 xxxv 1116 xxxiii 1068 xxv 418), ISSN: 1051-4651, ISBN: 0-7695-1695-X, [Digital Object Identifier: 10.1109/ICPR.2002.1044771][IEEEXplore#] Abstract We present an algorithm [...]]]></description>
			<content:encoded><![CDATA[<p>Haro, A.   Essa, I. (2002), &#8220;Learning video processing by example&#8221; <em>In Proceedings of 16th International Conference on Pattern Recognition, 2002</em>, 11-15 Aug. 2002 Volume: 1, page(s): 487 &#8211; 491 vol.1, Number of Pages: 4 vol.(xxix 834 xxxv 1116 xxxiii 1068 xxv 418), ISSN: 1051-4651, ISBN: 0-7695-1695-X, [Digital Object Identifier: 10.1109/ICPR.2002.1044771][<a href="http://ieeexplore.ieee.org/search/freesrchabstract.jsp?arnumber=1044771&amp;isnumber=22378&amp;punumber=8091&amp;k2dockey=1044771@ieeecnfs&amp;query=((essa)%3Cin%3Eau+)&amp;pos=5&amp;access=yes">IEEEXplore#</a>]</p>
<p style="text-align: center;"><strong>Abstract</strong></p>
<p style="text-align: justify;">We present an algorithm that approximates the output of an arbitrary video processing algorithm based on a pair of input and output exemplars. Our algorithm relies on learning the mapping between the input and output exemplars to model the processing that has taken place. We approximate the processing by observing that pixel neighborhoods similar in appearance and motion to those in the exemplar input should result in neighborhoods similar to the exemplar output. Since there are not many pixel neighborhoods in the exemplars, we use techniques from texture synthesis to generalize the output of neighborhoods not observed in the exemplars. The same algorithm is used to learn such processing as motion blur color correction, and painting.</p>
]]></content:encoded>
			<wfw:commentRss>http://prof.irfanessa.com/2002/08/11/ieeexplore-learning-video-processing-by-example/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
