MENU: Home Bio Affiliations Research Teaching Publications Videos Collaborators/Students Contact FAQ ©2007-15 RSS

Thesis: Mitch Parry PhD (2007), “Separation and Analysis of Multichannel Signals”

October 9th, 2007 Irfan Essa Posted in 0205507, Audio Analysis, Funding, Mitch Parry, PhD, Thesis No Comments »

Mitch Parry (2007), Separation and Analysis of Multichannel Signals PhD Thesis [PDF], Georgia Institute of Techniology, College of Computing, Atlanta, GA. (Advisor: Irfan Essa)


This thesis examines a large and growing class of digital signals that capture the combined effect of multiple underlying factors. In order to better understand these signals, we would like to separate and analyze the underlying factors independently. Although source separation applies to a wide variety of signals, this thesis focuses on separating individual instruments from a musical recording. In particular, we propose novel algorithms for separating instrument recordings given only their mixture. When the number of source signals does not exceed the number of mixture signals, we focus on a subclass of source separation algorithms based on joint diagonalization. Each approach leverages a different form of source structure. We introduce repetitive structure as an alternative that leverages unique repetition patterns in music and compare its performance against the other techniques.

When the number of source signals exceeds the number of mixtures (i.e., the underdetermined problem), we focus on spectrogram factorization techniques for source separation. We extend single-channel techniques to utilize the additional spatial information in multichannel recordings, and use phase information to improve the estimation of the underlying components.

via Separation and Analysis of Multichannel Signals.

AddThis Social Bookmark Button

Paper: IEEE ICASSP (2007) “Incorporating Phase Information for Source Separation via Spectrogram Factorization”

April 15th, 2007 Irfan Essa Posted in 0205507, Audio Analysis, Funding, Mitch Parry, Papers, Research No Comments »

Parry, R.M. Essa, I. (2007) “Incorporating Phase Information for Source Separation via Spectrogram Factorization.” In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. 15-20 April 2007, Volume: 2, page(s): II-661 – II-66, Honolulu, HI, ISSN: 1520-6149, ISBN: 1-4244-0728-1, INSPEC Accession Number:9497202, Digital Object Identifier: 10.1109/ICASSP.2007.366322


Spectrogram factorization methods have been proposed for single channel source separation and audio analysis. Typically, the mixture signal is first converted into a time-frequency representation such as the short-time Fourier transform (STFT). The phase information is thrown away and this spectrogram matrix is then factored into the sum of rank-one source spectrograms. This approach incorrectly assumes the mixture spectrogram is the sum of the source spectrograms. In fact, the mixture spectrogram depends on the phase of the source STFTs. We investigate the consequences of this common assumption and introduce an approach that leverages a probabilistic representation of phase to improve the separation results

AddThis Social Bookmark Button

Paper: IEEE ICASSP (2006) “Source Detection Using Repetitive Structure”

May 14th, 2006 Irfan Essa Posted in 0205507, Audio Analysis, Funding, Mitch Parry, Papers, Research No Comments »

Parry, R.M. Essa, I. (2006) “Source Detection Using Repetitive Structure (IEEEXplore).” Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, 2006. ICASSP 2006, Publication Date: 14-19 May 2006, Volume: 4, page(s): IV – IV, Location: Toulouse, ISSN: 1520-6149, ISBN: 1-4244-0469-X, INSPEC Accession Number:9154520, Digital Object Identifier: 10.1109/ICASSP.2006.1661163


Blind source separation algorithms typically require that the number of sources are known in advance. However, it is often the case that the number of sources change over time and that the total number is not known. Existing source separation techniques require source number estimation methods to determine how many sources are active within the mixture signals. These methods typically operate on the covariance matrix of mixture recordings and require fewer active sources than mixtures. When sources do not overlap in the time-frequency domain, more sources than mixtures may be detected and then separated. However, separating more sources than mixtures when sources overlap in time and frequency poses a particularly difficult problem. This paper addresses the issue of source detection when more sources than sensors overlap in time and frequency. We show that repetitive structure in the form of time-time correlation matrices can reveal when each source is active

AddThis Social Bookmark Button

Funding: NSF (1998) Experimental Software Systems “Automated Understanding of Captured Experience”

September 1st, 1998 Irfan Essa Posted in Activity Recognition, Audio Analysis, Aware Home, Funding, Gregory Abowd, Intelligent Environments No Comments »

Award#9806822 – Experimental Software Systems: Automated Understanding of Captured Experience

9806822 Essa, Irfan A. Abowd, Gregory D. Georgia Institute of Technology Experimental Software Systems: Automated Understanding of Captured Experience The objective of this research is to reduce substantially the human input necessary for creating and accessing large collections of multimedia, particularly multimedia created by capturing what is happening in an environment. The existing software system which is being used as the starting point for this investigation is Classroom 2000, a system designed to capture what happens in classrooms, meetings, and offices. Classroom 2000 integrates and synchronizes multiple streams of captured text, images, handwritten annotations, audio, and video. In a sense, it automates note-taking for a lecture or meeting. The research challenge is to make sense of this flood of captured data. The project explores how the output of Classroom 2000 can be automatically structured, segmented, indexed, and linked. Machine learning and statistical approaches to language are used to attempt to understand the captured data. Techniques from computational perception are used to try to find structure in the captured data. An important component of this research is the experimental analysis of the software system being built. The expectation is that this research will have a dramatic impact on how humans work and learn, as technology aids humans by capturing and making accessible what happens in an environment.

AddThis Social Bookmark Button