<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>prof.irfanessa.com &#187; Audio Analysis</title>
	<atom:link href="http://prof.irfanessa.com/tag/audio-analysis/feed/" rel="self" type="application/rss+xml" />
	<link>http://prof.irfanessa.com</link>
	<description>Irfan Essa&#039;s Academic Activities</description>
	<lastBuildDate>Wed, 25 Jan 2012 23:42:09 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	
		<item>
		<title>Thesis: Mitch Parry PhD (2007), &#8220;Separation and Analysis of Multichannel Signals&#8221;</title>
		<link>http://prof.irfanessa.com/2007/10/09/mitch-parry-phd-thesis-2007-separation-and-analysis-of-multichannel-signals/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=mitch-parry-phd-thesis-2007-separation-and-analysis-of-multichannel-signals</link>
		<comments>http://prof.irfanessa.com/2007/10/09/mitch-parry-phd-thesis-2007-separation-and-analysis-of-multichannel-signals/#comments</comments>
		<pubDate>Tue, 09 Oct 2007 14:54:50 +0000</pubDate>
		<dc:creator>Irfan Essa</dc:creator>
				<category><![CDATA[0205507]]></category>
		<category><![CDATA[Audio Analysis]]></category>
		<category><![CDATA[Funding]]></category>
		<category><![CDATA[Mitch Parry]]></category>
		<category><![CDATA[PhD]]></category>
		<category><![CDATA[Thesis]]></category>
		<category><![CDATA[2007]]></category>
		<category><![CDATA[NSF]]></category>

		<guid isPermaLink="false">http://essa.org/irfan/wp/?p=34</guid>
		<description><![CDATA[Mitch Parry (2007), Separation and Analysis of Multichannel Signals PhD Thesis [PDF], Georgia Institute of Techniology, College of Computing, Atlanta, GA. (Advisor: Irfan Essa) Abstract This thesis examines a large and growing class of digital signals that capture the combined effect of multiple underlying factors. In order to better understand these signals, we would like [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://home.cc.gatech.edu/parry" target="_blank">Mitch Parry</a> (2007), <a href="http://etd.gatech.edu/theses/available/etd-10052007-144600/">Separation and Analysis of Multichannel Signals</a> PhD Thesis [<a href="http://www.cc.gatech.edu/~parry/thesis/parry-thesis.pdf" target="_blank">PDF</a>], Georgia Institute of Techniology, College of Computing, Atlanta, GA. (Advisor: <a href="http://www.cc.gatech.edu/~irfan">Irfan Essa</a>)</p>
<p><strong>Abstract</strong></p>
<p><a href="http://home.cc.gatech.edu/parry" target="_blank"><img src="http://home.cc.gatech.edu/parry/uploads/1/mitch2.jpg" align="right" height="106" width="130" /></a>This thesis examines a large and growing class of digital signals that capture the combined effect of multiple underlying factors. In order to better understand these signals, we would like to separate and analyze the underlying factors independently. Although source separation applies to a wide variety of signals, this thesis focuses on separating individual instruments from a musical recording. In particular, we propose novel algorithms for separating instrument recordings given only their mixture. When the number of source signals does not exceed the number of mixture signals, we focus on a subclass of source separation algorithms based on joint diagonalization. Each approach leverages a different form of source structure. We introduce repetitive structure as an alternative that leverages unique repetition patterns in music and compare its performance against the other techniques.</p>
<p>When the number of source signals exceeds the number of mixtures (i.e., the underdetermined problem), we focus on spectrogram factorization techniques for source separation. We extend single-channel techniques to utilize the additional spatial information in multichannel recordings, and use phase information to improve the estimation of the underlying components.</p>
]]></content:encoded>
			<wfw:commentRss>http://prof.irfanessa.com/2007/10/09/mitch-parry-phd-thesis-2007-separation-and-analysis-of-multichannel-signals/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Paper: IEEE ICASSP (2007) &#8220;Incorporating Phase Information for Source Separation via Spectrogram Factorization&#8221;</title>
		<link>http://prof.irfanessa.com/2007/04/15/paper-ieee-icassp-2007-incorporating-phase-information-for-source-separation-via-spectrogram-factorization/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=paper-ieee-icassp-2007-incorporating-phase-information-for-source-separation-via-spectrogram-factorization</link>
		<comments>http://prof.irfanessa.com/2007/04/15/paper-ieee-icassp-2007-incorporating-phase-information-for-source-separation-via-spectrogram-factorization/#comments</comments>
		<pubDate>Sun, 15 Apr 2007 15:22:37 +0000</pubDate>
		<dc:creator>Irfan Essa</dc:creator>
				<category><![CDATA[0205507]]></category>
		<category><![CDATA[Audio Analysis]]></category>
		<category><![CDATA[Funding]]></category>
		<category><![CDATA[Mitch Parry]]></category>
		<category><![CDATA[Papers]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[2007]]></category>
		<category><![CDATA[NSF]]></category>

		<guid isPermaLink="false">http://academics.irfanessa.com/2007/04/15/paper-ieee-icassp-2007-incorporating-phase-information-for-source-separation-via-spectrogram-factorization/</guid>
		<description><![CDATA[Parry, R.M. Essa, I. (2007) &#8220;Incorporating Phase Information for Source Separation via Spectrogram Factorization.&#8221; In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. 15-20 April 2007, Volume: 2, page(s): II-661 &#8211; II-66, Honolulu, HI, ISSN: 1520-6149, ISBN: 1-4244-0728-1, INSPEC Accession Number:9497202, Digital Object Identifier: 10.1109/ICASSP.2007.366322 Abstract Spectrogram factorization methods have been proposed for single channel source separation and audio [...]]]></description>
			<content:encoded><![CDATA[<p>Parry, R.M. Essa, I. (2007) &#8220;<a href="http://ieeexplore.ieee.org/search/srchabstract.jsp?arnumber=4217495&amp;isnumber=4217319&amp;punumber=4216989&amp;k2dockey=4217495@ieeecnfs&amp;query=%28%28essa%29%3Cin%3Eau+%29&amp;pos=7">Incorporating Phase Information for Source Separation via Spectrogram Factorization</a>.&#8221; In Proceedings of <em>IEEE International Conference on Acoustics, Speech and Signal Processing, 2007. ICASSP 2007</em>. 15-20 April 2007, Volume: 2, page(s): II-661 &#8211; II-66, Honolulu, HI, ISSN: 1520-6149, ISBN: 1-4244-0728-1, INSPEC Accession Number:9497202, Digital Object Identifier: 10.1109/ICASSP.2007.366322</p>
<p align="center"><strong>Abstract</strong></p>
<p>Spectrogram factorization methods have been proposed for single channel source separation and audio analysis. Typically, the mixture signal is first converted into a time-frequency representation such as the short-time Fourier transform (STFT). The phase information is thrown away and this spectrogram matrix is then factored into the sum of rank-one source spectrograms. This approach incorrectly assumes the mixture spectrogram is the sum of the source spectrograms. In fact, the mixture spectrogram depends on the phase of the source STFTs. We investigate the consequences of this common assumption and introduce an approach that leverages a probabilistic representation of phase to improve the separation results</p>
]]></content:encoded>
			<wfw:commentRss>http://prof.irfanessa.com/2007/04/15/paper-ieee-icassp-2007-incorporating-phase-information-for-source-separation-via-spectrogram-factorization/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Paper: IEEE ICASSP (2006) &#8220;Source Detection Using Repetitive Structure&#8221;</title>
		<link>http://prof.irfanessa.com/2006/05/14/paper-ieee-icassp-2006-source-detection-using-repetitive-structure/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=paper-ieee-icassp-2006-source-detection-using-repetitive-structure</link>
		<comments>http://prof.irfanessa.com/2006/05/14/paper-ieee-icassp-2006-source-detection-using-repetitive-structure/#comments</comments>
		<pubDate>Sun, 14 May 2006 15:25:18 +0000</pubDate>
		<dc:creator>Irfan Essa</dc:creator>
				<category><![CDATA[0205507]]></category>
		<category><![CDATA[Audio Analysis]]></category>
		<category><![CDATA[Funding]]></category>
		<category><![CDATA[Mitch Parry]]></category>
		<category><![CDATA[Papers]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[2006]]></category>
		<category><![CDATA[NSF]]></category>

		<guid isPermaLink="false">http://academics.irfanessa.com/2006/05/14/paper-ieee-icassp-2006-source-detection-using-repetitive-structure/</guid>
		<description><![CDATA[Parry, R.M. Essa, I. (2006) &#8220;Source Detection Using Repetitive Structure (IEEEXplore).&#8221; Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, 2006. ICASSP 2006, Publication Date: 14-19 May 2006, Volume: 4, page(s): IV &#8211; IV, Location: Toulouse, ISSN: 1520-6149, ISBN: 1-4244-0469-X, INSPEC Accession Number:9154520, Digital Object Identifier: 10.1109/ICASSP.2006.1661163 Abstract Blind source separation algorithms typically require that the number of sources are known in advance. [...]]]></description>
			<content:encoded><![CDATA[<p>Parry, R.M. Essa, I. (2006) &#8220;<a href="http://ieeexplore.ieee.org/search/srchabstract.jsp?arnumber=1661163&amp;isnumber=34760&amp;punumber=11024&amp;k2dockey=1661163@ieeecnfs&amp;query=%28%28essa%29%3Cin%3Eau+%29&amp;pos=10">Source Detection Using Repetitive Structure (IEEEXplore)</a>.&#8221; Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, 2006. ICASSP 2006, Publication Date: 14-19 May 2006, Volume: 4, page(s): IV &#8211; IV, Location: Toulouse, ISSN: 1520-6149, ISBN: 1-4244-0469-X, INSPEC Accession Number:9154520, Digital Object Identifier: 10.1109/ICASSP.2006.1661163</p>
<p align="center"><strong>Abstract</strong></p>
<p style="text-align: justify;">Blind source separation algorithms typically require that the number of sources are known in advance. However, it is often the case that the number of sources change over time and that the total number is not known. Existing source separation techniques require source number estimation methods to determine how many sources are active within the mixture signals. These methods typically operate on the covariance matrix of mixture recordings and require fewer active sources than mixtures. When sources do not overlap in the time-frequency domain, more sources than mixtures may be detected and then separated. However, separating more sources than mixtures when sources overlap in time and frequency poses a particularly difficult problem. This paper addresses the issue of source detection when more sources than sensors overlap in time and frequency. We show that repetitive structure in the form of time-time correlation matrices can reveal when each source is active</p>
]]></content:encoded>
			<wfw:commentRss>http://prof.irfanessa.com/2006/05/14/paper-ieee-icassp-2006-source-detection-using-repetitive-structure/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Funding: NSF/ITR (2002) &#8220;Analysis of Complex Audio-Visual Events Using Spatially Distributed Sensors&#8221;</title>
		<link>http://prof.irfanessa.com/2002/10/01/funding-nsfitr-2002-analysis-of-complex-audio-visual-events-using-spatially-distributed-sensors/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=funding-nsfitr-2002-analysis-of-complex-audio-visual-events-using-spatially-distributed-sensors</link>
		<comments>http://prof.irfanessa.com/2002/10/01/funding-nsfitr-2002-analysis-of-complex-audio-visual-events-using-spatially-distributed-sensors/#comments</comments>
		<pubDate>Tue, 01 Oct 2002 14:56:34 +0000</pubDate>
		<dc:creator>Irfan Essa</dc:creator>
				<category><![CDATA[0205507]]></category>
		<category><![CDATA[Funding]]></category>
		<category><![CDATA[James Rehg]]></category>
		<category><![CDATA[2002]]></category>
		<category><![CDATA[Audio Analysis]]></category>
		<category><![CDATA[Computer Vision]]></category>
		<category><![CDATA[NSF]]></category>

		<guid isPermaLink="false">http://academics.irfanessa.com/2002/10/01/funding-nsfitr-2002-analysis-of-complex-audio-visual-events-using-spatially-distributed-sensors/</guid>
		<description><![CDATA[Award#0205507 &#8211; ITR: Analysis of Complex Audio-Visual Events Using Spatially Distributed Sensors ABSTRACT We propose to develop a comprehensive framework for the joint analysis of audio-visual signals obtained from spatially distributed microphones and cameras. We desire solutions to the audio-visual sensing problem that will scale to an arbitrary number of cameras and microphones and can [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://nsf.gov/awardsearch/showAward.do?AwardNumber=0205507">Award#0205507 &#8211; ITR: Analysis of Complex Audio-Visual Events Using Spatially Distributed Sensors</a></p>
<p style="text-align: center;"><strong>ABSTRACT</strong></p>
<p style="text-align: justify;">We propose to develop a comprehensive framework for the joint analysis of audio-visual signals obtained from spatially distributed microphones and cameras. We desire solutions to the audio-visual sensing problem that will scale to an arbitrary number of cameras and microphones and can address challenging environments in which there are multiple speech and nonspeech sound sources and multiple moving people and objects. Recently it has become relatively inexpensive to deploy tens or even hundreds of cameras and microphones in an environment. Many applications could benefit from ability to sense in both modalities. There are two levels at which joint audio-visual analysis can take place. At the signal level, the challenge is to develop representations that capture the rich dependency structure in the joint signal and deal success-fully issues such as variable sampling rates and varying temporal delays between cues. At the spatial level the challenge is to compensate for the distortions introduced by the sensor location and pool information across sensors to recover 3-D information about the spatial environment. For many applications, it is highly desirable if the solution method is self-calibrating, and does not require an extensive manual calibration process every time a new sensor is added or an old sensor is moved or replaced. Removing the burden of manual calibration also makes it possible to exploit ad hoc sensor networks which could arise, for example, from wearable microphones and cameras. We propose to address the following four research topics: 1. Representations and learning methods for signal level fusion. 2. Volumetric techniques for fusing spatially distributed audio-visual data. 3. Self-calibration of distributed microphone-camera systems 4. Applications of audio-visual sensing. For example, this proposal includes considerable work on lip and facial analysis to improve voice communications.</p>
]]></content:encoded>
			<wfw:commentRss>http://prof.irfanessa.com/2002/10/01/funding-nsfitr-2002-analysis-of-complex-audio-visual-events-using-spatially-distributed-sensors/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Paper: PUI (1997) &#8220;Prosody Analysis for Speaker Affect Determination&#8221;</title>
		<link>http://prof.irfanessa.com/1997/10/12/paper-pui-1997-prosody-analysis-for-speaker-affect-determination/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=paper-pui-1997-prosody-analysis-for-speaker-affect-determination</link>
		<comments>http://prof.irfanessa.com/1997/10/12/paper-pui-1997-prosody-analysis-for-speaker-affect-determination/#comments</comments>
		<pubDate>Sun, 12 Oct 1997 19:18:58 +0000</pubDate>
		<dc:creator>Irfan Essa</dc:creator>
				<category><![CDATA[Affective Computing]]></category>
		<category><![CDATA[Papers]]></category>
		<category><![CDATA[1997]]></category>
		<category><![CDATA[Audio Analysis]]></category>
		<category><![CDATA[HCI]]></category>

		<guid isPermaLink="false">http://academics.irfanessa.com/?p=259</guid>
		<description><![CDATA[Andrew Gardner and Irfan Essa (1997) &#8220;Prosody Analysis for Speaker Affect Determination&#8221; In Proceedings of Perceptual User Interfaces Workshop (PUI 1997), Banff, Alberta, CANADA, Oct 1997 [PDF][Project Site] Abstract Speech is a complex waveform containing verbal (e.g. phoneme, syllable, and word) and nonverbal (e.g. speaker identity, emotional state, and tone) information. Both the verbal and [...]]]></description>
			<content:encoded><![CDATA[<p>Andrew Gardner and Irfan Essa (1997) &#8220;<a href="http://www-static.cc.gatech.edu/cpl/pubs/pui.97/">Prosody Analysis for Speaker Affect Determination</a>&#8221; In Proceedings of Perceptual User Interfaces Workshop (PUI 1997), Banff, Alberta, CANADA, Oct 1997 [<a href="http://www-static.cc.gatech.edu/cpl/pubs/pui.97/pui97.pdf" target="_self">PDF</a>][<a href="http://www-static.cc.gatech.edu/cpl/pubs/pui.97/" target="_self">Project Site</a>]</p>
<p style="text-align: center;"><strong>Abstract</strong></p>
<p style="text-align: justify;">Speech is a complex waveform containing verbal (e.g. phoneme, syllable, and word) and nonverbal (e.g. speaker identity, emotional state, and tone) information. Both the verbal and nonverbal aspects of speech are extremely important in interpersonal communication and human-machine interaction. However, work in machine perception of speech has focused primarily on the verbal, or content-oriented, goals of speech recognition, speech compression, and speech labeling. Usage of nonverbal information has been limited to speaker identification applications. While the success of research in these areas is well documented, this success is fundamentally limited by the effect of nonverbal information on the speech waveform. The extra-linguistic aspect of speech is considered a source of variability that theoretically can be minimized with an appropriate preprocessing technique; determination of such robust techniques is however, far from trivial.</p>
<p style="text-align: justify;">It is widely believed in the speech processing community that the nonverbal component of speech contains higher-level information that provides cues for auditory scene analysis, speech understanding, and the determination of a speaker&#8217;s psychological state or conversational tone. We believe that the identification of such nonverbal cues can improve the performance of classic speech processing tasks and will be necessary for the realization of natural, robust human-computer speech interfaces. In this paper we seek to address the problem of how to systematically analyze the nonverbal aspect of the speech waveform to determine speaker affect, specifically by analyzing the pitch contour.</p>
]]></content:encoded>
			<wfw:commentRss>http://prof.irfanessa.com/1997/10/12/paper-pui-1997-prosody-analysis-for-speaker-affect-determination/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Served from: prof.irfanessa.com @ 2012-02-05 15:27:51 -->
