In the News (2010): DARPA Awards Kitware a $13.8 Million Contract for Online Threat Detection and Forensic Analysis in Wide-Area Motion Imagery

September 2nd, 2010 Irfan Essa Posted in Activity Recognition, Grant Schindler, PERSEAS, Visual Surviellance No Comments »

via Kitware – News: DARPA Awards Kitware a $13.8 Million Contract for Online Threat Detection and Forensic Analysis in Wide-Area Motion Imagery.

Kitware has received a $13,883,314 contract from Defense Advanced Research Projects Agency (DARPA) to develop a software system capable of automatically and interactively discovering actionable intelligence from wide area motion imagery (WAMI) of complex urban, suburban, and rural environments.

The primary information elements in WAMI data are moving entities in the context of roads, buildings, and other scene features. These entities, while exploitable, often yield fragmented tracks in complex urban environments due to occlusions, stops, and other factors. Kitware’s software system will use algorithmic solutions to associate tracks and then identify and integrate local events to detect potential threats and perform forensic analysis.

The developed algorithms will form the basis of a software prototype called the Persistent Stare Exploitation and Analysis System (PerSEAS) that will significantly augment an end-user’s ability to discover novel intelligence using models of activities, normalcy, and context. Since the vast majority of events are normal and pose no threat, the models must cross-integrate singular events to discover relationships and anomalies that are indicative of suspicious behavior or match previously learned – or defined – threat activity.

The advanced PerSEAS system will markedly improve an analyst’s ability to handle burgeoning WAMI data and reduce the time required to perform many current exploitation tasks, greatly enhancing the military’s capability to analyze and utilize the data for forensic analysis and through the issuance of timely threat alerts with a minimal number of false alarms.

Due to the complex, multi-disciplinary nature of the research, Kitware will partner with academic experts in the fields of computer vision, probabilistic reasoning, machine learning and other related domains. Phase I of the research is expected to be completed in two years.

The awarded contract will expand Kitware’s leadership in the field of computer vision, video analysis and advanced visualization software. The project will build upon our previous DARPA-sponsored research into content-based video retrieval on the VIRAT program; anomaly detection on the PANDA program; and the recognition of complex multi-agent activities in video.

To meet the PerSEAS program’s needs, Kitware has assembled a world-class team including four leading defense technology companies, Northrop Grumman Corporation, ; Honeywell Automation and Control Solutions Laboratories, Aptima, Inc., and Navia, Inc. As well as multiple internationally-renowned research institutions, including: the University of California, Berkeley; Computer Vision Laboratory, University of Maryland; Rensselaer Polytechnic Institute; the Computer Vision Lab at the University of Central Florida; the School of Interactive Computing at Georgia Tech and its affiliated Center for Robotics & Intelligent Machines; and Columbia University.


AddThis Social Bookmark Button


June 1st, 2010 Irfan Essa Posted in Grant Schindler, PERSEAS, Visual Surviellance No Comments »

 Persistent Stare Exploitation and Analysis System (PerSEAS)

Part of the team led by Kitware Inc to work on Defense Advanced Research Projects Agency – Persistent Stare Exploitation and Analysis System (PerSEAS).

The Persistent Stare Exploitation and Analysis System (PerSEAS) program is developing the capability to automatically and interactively identify potential threats as they emerge based on the correlation of multiple disparate activities and events in wide area motion imagery (WAMI) and multi-INT data.  PerSEAS will enable new methods of threat hypothesis adjudication and forensic analysis through activity-based modeling and inferencing capabilities.


AddThis Social Bookmark Button

Paper (2009): ICASSP “Learning Basic Units in American Sign Language using Discriminative Segmental Feature Selection”

February 4th, 2009 Irfan Essa Posted in 0205507, Face and Gesture, ICASSP, James Rehg, Machine Learning, Pei Yin, Thad Starner No Comments »

Pei Yin, Thad Starner, Harley Hamilton, Irfan Essa, James M. Rehg (2009), “Learning Basic Units in American Sign Language using Discriminative Segmental Feature Selection” in IEEE Conference on Acoustics, Speech, and Signal Processing 2009 (ICASSP 2009). Session: Spoken Language Understanding I, Tuesday, April 21, 11:00 – 13:00, Taipei, Taiwan.


The natural language for most deaf signers in the United States is American Sign Language (ASL). ASL has internal structure like spoken languages, and ASL linguists have introduced several phonemic models. The study of ASL phonemes is not only interesting to linguists, but also useful for scalability in recognition by machines. Since machine perception is different than human perception, this paper learns the basic units for ASL directly from data. Comparing with previous studies, our approach computes a set of data-driven units (fenemes) discriminatively from the results of segmental feature selection. The learning iterates the following two steps: first apply discriminative feature selection segmentally to the signs, and then tie the most similar temporal segments to re-train. Intuitively, the sign parts indistinguishable to machines are merged to form basic units, which we call ASL fenemes. Experiments on publicly available ASL recognition data show that the extracted data-driven fenemes are meaningful, and recognition using those fenemes achieves improved accuracy at reduced model complexity

AddThis Social Bookmark Button

Paper: ICASSP (2008) “Discriminative Feature Selection for Hidden Markov Models using Segmental Boosting”

April 3rd, 2008 Irfan Essa Posted in 0205507, Face and Gesture, Funding, James Rehg, Machine Learning, PAMI/ICCV/CVPR/ECCV, Papers, Pei Yin, Thad Starner No Comments »

Pei Yin, Irfan Essa, James Rehg, Thad Starner (2008) “Discriminative Feature Selection for Hidden Markov Models using Segmental Boosting”, ICASSP 2008 – March 30 – April 4, 2008 – Las Vegas, Nevada, U.S.A. (Paper: MLSP-P3.D8, Session: Pattern Recognition and Classification II, Time: Thursday, April 3, 15:30 – 17:30, Topic: Machine Learning for Signal Processing: Learning Theory and Modeling) (PDF|Project Site)


icassp08We address the feature selection problem for hidden Markov models (HMMs) in sequence classification. Temporal correlation in sequences often causes difficulty in applying feature selection techniques. Inspired by segmental k-means segmentation (SKS), we propose Segmentally Boosted HMMs (SBHMMs), where the state-optimized features are constructed in a segmental and discriminative manner. The contributions are twofold. First, we introduce a novel feature selection algorithm, where the temporal dynamics are decoupled from the static learning procedure by assuming that the sequential data are piecewise independent and identically distributed. Second, we show that the SBHMM consistently improves traditional HMM recognition in various domains. The reduction of error compared to traditional HMMs ranges from 17% to 70% in American Sign Language recognition, human gait identification, lip reading, and speech recognition.

AddThis Social Bookmark Button

Funding: NSF (2008) “Symposium on Computation and Journalism”

March 8th, 2008 Irfan Essa Posted in Computational Journalism, Funding No Comments »

Award#0813831 – Symposium on Computation and Journalism


Fundamentally, journalism is aimed at collecting news information and disseminating that information with a layer of contextualization and understanding provided by journalists. Recent advances in computational technology are rapidly affecting how news information is gathered, reported and distributed. Furthermore, new avenues for aggregating, visualizing, summarizing, consuming, and collaborating on news are increasingly becoming popular and challenging traditional practices of Journalism. Following the success of text search, image and video search questions are now poised to make a bigger impact to journalism and other related fields. Computation and Journalism individually share a deep routed interest in Information, and the value it provides to society. The concept of Information Quality, the measure of the value that the information provides to the user of that information, brings these two disciplines together. In computing and information sciences, information quality is used to describe the degree of excellence in communicating knowledge or intelligence and is composed of different facets such as accuracy, reliability, comprehensiveness, currency, and validity. In journalism, where the conveyance of quality information is paramount, principles such as accuracy, fairness, thoroughness, and transparency guide journalists in communicating quality information. Traditionally, journalism has also entailed an ethos of working on the side of the citizenry to provide them with quality information they need to make informed decisions in the process of their daily lives. However, the plethora of un-vetted blogs, podcasts, videos and other online media, generated by users or by corporations with subjective biases have led to significant compromise in information quality. Collaborative knowledge generation (wikipedia), and citizen journalism, are showing new ways of how information and (global) news can be shared. However, as the Web and the Internet continue to grow and as computing technologies pervade through the planet, a thorough study of the process of journalism and the deep computational aspects of such processes need to be undertaken. To this end, the PI’s research group at Georgia Institute of Technology is interested in understanding how computational advances impact the field of journalism. The long term aim is to make novel contributions by developing computational technologies to better support the goals of journalism. To launch this effort, they are organizing a Symposium on Computation + Journalism at GA Tech, in Atlanta, GA, February 22-23, 2008. The goal of this symposium is to bring together stakeholder from the all aspects of Journalism, Media, and Computation. Participants in panels, presentations and breakout groups will discuss these issues and create a roadmap towards answering these questions that bring together computation and journalism.

AddThis Social Bookmark Button

Thesis: Mitch Parry PhD (2007), “Separation and Analysis of Multichannel Signals”

October 9th, 2007 Irfan Essa Posted in 0205507, Audio Analysis, Funding, Mitch Parry, PhD, Thesis No Comments »

Mitch Parry (2007), Separation and Analysis of Multichannel Signals PhD Thesis [PDF], Georgia Institute of Techniology, College of Computing, Atlanta, GA. (Advisor: Irfan Essa)


This thesis examines a large and growing class of digital signals that capture the combined effect of multiple underlying factors. In order to better understand these signals, we would like to separate and analyze the underlying factors independently. Although source separation applies to a wide variety of signals, this thesis focuses on separating individual instruments from a musical recording. In particular, we propose novel algorithms for separating instrument recordings given only their mixture. When the number of source signals does not exceed the number of mixture signals, we focus on a subclass of source separation algorithms based on joint diagonalization. Each approach leverages a different form of source structure. We introduce repetitive structure as an alternative that leverages unique repetition patterns in music and compare its performance against the other techniques.

When the number of source signals exceeds the number of mixtures (i.e., the underdetermined problem), we focus on spectrogram factorization techniques for source separation. We extend single-channel techniques to utilize the additional spatial information in multichannel recordings, and use phase information to improve the estimation of the underlying components.

via Separation and Analysis of Multichannel Signals.

AddThis Social Bookmark Button

Paper: IEEE CVPR (2007) “Tree-based Classifiers for Bilayer Video Segmentation”

June 17th, 2007 Irfan Essa Posted in 0205507, Antonio Crimisini, Computational Photography and Video, Funding, John Winn, Machine Learning, Papers, Pei Yin, Research No Comments »

Yin, Pei Criminisi, Antonio Winn, John Essa, Irfan (2007), Tree-based Classifiers for Bilayer Video Segmentation In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2007. CVPR ’07, 17-22 June 2007, page(s): 1 – 8, Location: Minneapolis, MN, USA, ISBN: 1-4244-1180-7, Digital Object Identifier: 10.1109/CVPR.2007.383008


This paper presents an algorithm for the automatic segmentation of monocular videos into foreground and background layers. Correct segmentations are produced even in the presence of large background motion with nearly stationary foreground. There are three key contributions. The first is the introduction of a novel motion representation, “motons”, inspired by research in object recognition. Second, we propose learning the segmentation likelihood from the spatial context of motion. The learning is efficiently performed by Random Forests. The third contribution is a general taxonomy of tree-based classifiers, which facilitates theoretical and experimental comparisons of several known classification algorithms, as well as spawning new ones. Diverse visual cues such as motion, motion context, colour, contrast and spatial priors are fused together by means of a Conditional Random Field (CRF) model. Segmentation is then achieved by binary min-cut. Our algorithm requires no initialization. Experiments on many video-chat type sequences demonstrate the effectiveness of our algorithm in a variety of scenes. The segmentation results are comparable to those obtained by stereo systems.

AddThis Social Bookmark Button

Paper: IEEE ICASSP (2007) “Incorporating Phase Information for Source Separation via Spectrogram Factorization”

April 15th, 2007 Irfan Essa Posted in 0205507, Audio Analysis, Funding, Mitch Parry, Papers, Research No Comments »

Parry, R.M. Essa, I. (2007) “Incorporating Phase Information for Source Separation via Spectrogram Factorization.” In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. 15-20 April 2007, Volume: 2, page(s): II-661 – II-66, Honolulu, HI, ISSN: 1520-6149, ISBN: 1-4244-0728-1, INSPEC Accession Number:9497202, Digital Object Identifier: 10.1109/ICASSP.2007.366322


Spectrogram factorization methods have been proposed for single channel source separation and audio analysis. Typically, the mixture signal is first converted into a time-frequency representation such as the short-time Fourier transform (STFT). The phase information is thrown away and this spectrogram matrix is then factored into the sum of rank-one source spectrograms. This approach incorrectly assumes the mixture spectrogram is the sum of the source spectrograms. In fact, the mixture spectrogram depends on the phase of the source STFTs. We investigate the consequences of this common assumption and introduce an approach that leverages a probabilistic representation of phase to improve the separation results

AddThis Social Bookmark Button

Paper: IEEE ICASSP (2006) “Source Detection Using Repetitive Structure”

May 14th, 2006 Irfan Essa Posted in 0205507, Audio Analysis, Funding, Mitch Parry, Papers, Research No Comments »

Parry, R.M. Essa, I. (2006) “Source Detection Using Repetitive Structure (IEEEXplore).” Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, 2006. ICASSP 2006, Publication Date: 14-19 May 2006, Volume: 4, page(s): IV – IV, Location: Toulouse, ISSN: 1520-6149, ISBN: 1-4244-0469-X, INSPEC Accession Number:9154520, Digital Object Identifier: 10.1109/ICASSP.2006.1661163


Blind source separation algorithms typically require that the number of sources are known in advance. However, it is often the case that the number of sources change over time and that the total number is not known. Existing source separation techniques require source number estimation methods to determine how many sources are active within the mixture signals. These methods typically operate on the covariance matrix of mixture recordings and require fewer active sources than mixtures. When sources do not overlap in the time-frequency domain, more sources than mixtures may be detected and then separated. However, separating more sources than mixtures when sources overlap in time and frequency poses a particularly difficult problem. This paper addresses the issue of source detection when more sources than sensors overlap in time and frequency. We show that repetitive structure in the form of time-time correlation matrices can reveal when each source is active

AddThis Social Bookmark Button

Paper: IEEE CVPR (2004) “Asymmetrically boosted HMM for speech reading”

June 2nd, 2004 Irfan Essa Posted in 0205507, Funding, James Rehg, Papers, Pei Yin No Comments »

Pei Yin Essa, I. Rehg, J.M. (2004) “Asymmetrically boosted HMM for speech reading,”, In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004 (CVPR 2004). Publication Date: 27 June-2 July 2004, Volume: 2, On page(s): II-755 – II-761 Vol.2 ISSN: 1063-6919, ISBN: 0-7695-2158-, INSPEC Accession Number:8161546, Digital Object Identifier: 10.1109/CVPR.2004.1315240


Speech reading, also known as lip reading, is aimed at extracting visual cues of lip and facial movements to aid in recognition of speech. The main hurdle for speech reading is that visual measurements of lip and facial motion lack information-rich features like the Mel frequency cepstral coefficients (MFCC), widely used in acoustic speech recognition. These MFCC are used with hidden Markov models (HMM) in most speech recognition systems at present. Speech reading could greatly benefit from automatic selection and formation of informative features from measurements in the visual domain. These new features can then be used with HMM to capture the dynamics of lip movement and eventual recognition of lip shapes. Towards this end, we use AdaBoost methods for automatic visual feature formation. Specifically, we design an asymmetric variant of AdaBoost M2 algorithm to deal with the ill-posed multi-class sample distribution inherent in our problem. Our experiments show that the boosted HMM approach outperforms conventional AdaBoost and HMM classifiers. Our primary contributions are in the design of (a) boosted HMM and (b) asymmetric multi-class boosting.

AddThis Social Bookmark Button