MENU: Home Bio Affiliations Research Teaching Publications Videos Collaborators/Students Contact FAQ ©2007-14 RSS

Paper in CVIU 2013 “A Visualization Framework for Team Sports Captured using Multiple Static Cameras”

October 3rd, 2013 Irfan Essa Posted in Activity Recognition, Computational Photography and Video, Jessica Hodgins, PAMI/ICCV/CVPR/ECCV, Papers, Raffay Hamid, Sports Visualization No Comments »

  • R. Hamid, R. Kumar, J. Hodgins, and I. Essa (2013), “A Visualization Framework for Team Sports Captured using Multiple Static Cameras,” Computer Vision and Image Understanding, p. -, 2013. [PDF] [WEBSITE] [VIDEO] [DOI] [BIBTEX]
    @article{2013-Hamid-VFTSCUMSC,
      Author = {Raffay Hamid and Ramkrishan Kumar and Jessica Hodgins and Irfan Essa},
      Date-Added = {2013-10-22 13:42:46 +0000},
      Date-Modified = {2013-10-22 13:51:43 +0000},
      Doi = {10.1016/j.cviu.2013.09.006},
      Issn = {1077-3142},
      Journal = {Computer Vision and Image Understanding},
      Number = {0},
      Pages = {-},
      Pdf = {http://www.cc.gatech.edu/~irfan/p/2013-Hamid-VFTSCUMSC.pdf},
      Title = {A Visualization Framework for Team Sports Captured using Multiple Static Cameras},
      Url = {http://raffayhamid.com/sports_viz.shtml},
      Video = {http://www.youtube.com/watch?v=VwzAMi9pUDQ},
      Year = {2013},
      Bdsk-Url-1 = {http://www.sciencedirect.com/science/article/pii/S1077314213001768},
      Bdsk-Url-2 = {http://dx.doi.org/10.1016/j.cviu.2013.09.006},
      Bdsk-Url-3 = {http://raffayhamid.com/sports_viz.shtml}}

Abstract

We present a novel approach for robust localization of multiple people observed using a set of static cameras. We use this location information to generate a visualization of the virtual offside line in soccer games. To compute the position of the offside line, we need to localize players′ positions, and identify their team roles. We solve the problem of fusing corresponding players′ positional information by finding minimum weight K-length cycles in a complete K-partite graph. Each partite of the graph corresponds to one of the K cameras, whereas each node of a partite encodes the position and appearance of a player observed from a particular camera. To find the minimum weight cycles in this graph, we use a dynamic programming based approach that varies over a continuum from maximally to minimally greedy in terms of the number of graph-paths explored at each iteration. We present proofs for the efficiency and performance bounds of our algorithms. Finally, we demonstrate the robustness of our framework by testing it on 82,000 frames of soccer footage captured over eight different illumination conditions, play types, and team attire. Our framework runs in near-real time, and processes video from 3 full HD cameras in about 0.4 seconds for each set of corresponding 3 frames.

via Science Direct A Visualization Framework for Team Sports Captured using Multiple Static Cameras.

AddThis Social Bookmark Button

Paper in ACM Ubicomp 2013 “Technological approaches for addressing privacy concerns when recognizing eating behaviors with wearable cameras”

September 14th, 2013 Irfan Essa Posted in Activity Recognition, Computational Photography and Video, Edison Thomaz, Gregory Abowd, ISWC, Mobile Computing, Papers, Ubiquitous Computing No Comments »

  • E. Thomaz, A. Parnami, J. Bidwell, I. Essa, and G. D. Abowd (2013), “Technological Approaches for Addressing Privacy Concerns when Recognizing Eating Behaviors with Wearable Cameras.,” in Proceedings of the ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp ’13), 2013. [PDF] [DOI] [BIBTEX]
    @inproceedings{2013-Thomaz-TAAPCWREBWWC,
      Author = {Edison Thomaz and Aman Parnami and Jonathan Bidwell and Irfan Essa and Gregory D. Abowd},
      Booktitle = {Proceedings of the ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp '13)},
      Date-Added = {2013-10-22 18:31:23 +0000},
      Date-Modified = {2013-10-22 19:19:14 +0000},
      Doi = {10.1145/2493432.2493509},
      Pdf = {http://www.cc.gatech.edu/~irfan/p/2013-Thomaz-TAAPCWREBWWC.pdf},
      Title = {Technological Approaches for Addressing Privacy Concerns when Recognizing Eating Behaviors with Wearable Cameras.},
      Year = {2013},
      Bdsk-Url-1 = {http://dx.doi.org/10.1145/2493432.2493509}}

 Abstract

First-person point-of-view (FPPOV) images taken by wearable cameras can be used to better understand people’s eating habits. Human computation is a way to provide effective analysis of FPPOV images in cases where algorithmic approaches currently fail. However, privacy is a serious concern. We provide a framework, the privacy-saliency matrix, for understanding the balance between the eating information in an image and its potential privacy concerns. Using data gathered by 5 participants wearing a lanyard-mounted smartphone, we show how the framework can be used to quantitatively assess the effectiveness of four automated techniques (face detection, image cropping, location filtering and motion filtering) at reducing the privacy-infringing content of images while still maintaining evidence of eating behaviors throughout the day.

via ACM DL Technological approaches for addressing privacy concerns when recognizing eating behaviors with wearable cameras.

AddThis Social Bookmark Button

Paper in IEEE CVPR 2013 “Geometric Context from Videos”

June 27th, 2013 Irfan Essa Posted in Matthias Grundmann, PAMI/ICCV/CVPR/ECCV, Papers, S. Hussain Raza No Comments »

  • S. H. Raza, M. Grundmann, and I. Essa (2013), “Geoemetric Context from Video,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013. [PDF] [WEBSITE] [VIDEO] [DOI] [BIBTEX]
    @inproceedings{2013-Raza-GCFV,
      Author = {Syed Hussain Raza and Matthias Grundmann and Irfan Essa},
      Booktitle = {Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
      Date-Added = {2013-06-25 11:46:01 +0000},
      Date-Modified = {2013-10-22 18:40:01 +0000},
      Doi = {10.1109/CVPR.2013.396},
      Month = {June},
      Organization = {IEEE Computer Society},
      Pdf = {http://www.cc.gatech.edu/~irfan/p/2013-Raza-GCFV.pdf},
      Title = {Geoemetric Context from Video},
      Url = {http://www.cc.gatech.edu/cpl/projects/videogeometriccontext/},
      Video = {http://www.youtube.com/watch?v=EXPmgKHPJ64},
      Year = {2013},
      Bdsk-Url-1 = {http://www.cc.gatech.edu/cpl/projects/abow/},
      Bdsk-Url-2 = {http://www.cc.gatech.edu/cpl/projects/videogeometriccontext/},
      Bdsk-Url-3 = {http://dx.doi.org/10.1109/CVPR.2013.396}}

Abstract

We present a novel algorithm for estimating the broad 3D geometric structure of outdoor video scenes. Leveraging spatio-temporal video segmentation, we decompose a dynamic scene captured by a video into geometric classes, based on predictions made by region-classifiers that are trained on appearance and motion features. By examining the homogeneity of the prediction, we combine predictions across multiple segmentation hierarchy levels alleviating the need to determine the granularity a priori. We built a novel, extensive dataset on geometric context of video to evaluate our method, consisting of over 100 ground-truth annotated outdoor videos with over 20,000 frames. To further scale beyond this dataset, we propose a semi-supervised learning framework to expand the pool of labeled data with high confidence predictions obtained from unlabeled data. Our system produces an accurate prediction of geometric context of video achieving 96% accuracy across main geometric classes.

via IEEE Xplore – Geometric Context from Videos.

AddThis Social Bookmark Button

Paper in IEEE CVPR 2013 “Decoding Children’s Social Behavior”

June 27th, 2013 Irfan Essa Posted in Affective Computing, Behavioral Imaging, Denis Lantsman, Gregory Abowd, James Rehg, PAMI/ICCV/CVPR/ECCV, Papers, Thomas Ploetz No Comments »

  • J. M. Rehg, G. D. Abowd, A. Rozga, M. Romero, M. A. Clements, S. Sclaroff, I. Essa, O. Y. Ousley, Y. Li, C. Kim, H. Rao, J. C. Kim, L. L. Presti, J. Zhang, D. Lantsman, J. Bidwell, and Z. Ye (2013), “Decoding Children’s Social Behavior,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013. [PDF] [WEBSITE] [DOI] [BIBTEX]
    @inproceedings{2013-Rehg-DCSB,
      Author = {James M. Rehg and Gregory D. Abowd and Agata Rozga and Mario Romero and Mark A. Clements and Stan Sclaroff and Irfan Essa and Opal Y. Ousley and Yin Li and Chanho Kim and Hrishikesh Rao and Jonathan C. Kim and Liliana Lo Presti and Jianming Zhang and Denis Lantsman and Jonathan Bidwell and Zhefan Ye},
      Booktitle = {Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
      Date-Added = {2013-06-25 11:47:42 +0000},
      Date-Modified = {2013-10-22 18:50:31 +0000},
      Doi = {10.1109/CVPR.2013.438},
      Month = {June},
      Organization = {IEEE Computer Society},
      Pdf = {http://www.cc.gatech.edu/~rehg/Papers/Rehg_CVPR13.pdf},
      Title = {Decoding Children's Social Behavior},
      Url = {http://www.cbi.gatech.edu/mmdb/},
      Year = {2013},
      Bdsk-Url-1 = {http://www.cbi.gatech.edu/mmdb/},
      Bdsk-Url-2 = {http://dx.doi.org/10.1109/CVPR.2013.438}}

Abstract

We introduce a new problem domain for activity recognition: the analysis of children’s social and communicative behaviors based on video and audio data. We specifically target interactions between children aged 1-2 years and an adult. Such interactions arise naturally in the diagnosis and treatment of developmental disorders such as autism. We introduce a new publicly-available dataset containing over 160 sessions of a 3-5 minute child-adult interaction. In each session, the adult examiner followed a semi-structured play interaction protocol which was designed to elicit a broad range of social behaviors. We identify the key technical challenges in analyzing these behaviors, and describe methods for decoding the interactions. We present experimental results that demonstrate the potential of the dataset to drive interesting research questions, and show preliminary results for multi-modal activity recognition.

Full database available from http://www.cbi.gatech.edu/mmdb/

via IEEE Xplore – Decoding Children’s Social Behavior.

AddThis Social Bookmark Button

Paper in IEEE CVPR 2013 “Augmenting Bag-of-Words: Data-Driven Discovery of Temporal and Structural Information for Activity Recognition”

June 27th, 2013 Irfan Essa Posted in Activity Recognition, Behavioral Imaging, Grant Schindler, PAMI/ICCV/CVPR/ECCV, Papers, Sports Visualization, Thomas Ploetz, Vinay Bettadapura No Comments »

  • V. Bettadapura, G. Schindler, T. Ploetz, and I. Essa (2013), “Augmenting Bag-of-Words: Data-Driven Discovery of Temporal and Structural Information for Activity Recognition,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013. [PDF] [WEBSITE] [DOI] [BIBTEX]
    @inproceedings{2013-Bettadapura-ABDDTSIAR,
      Author = {Vinay Bettadapura and Grant Schindler and Thomas Ploetz and Irfan Essa},
      Booktitle = {Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
      Date-Added = {2013-06-25 11:42:31 +0000},
      Date-Modified = {2013-10-22 18:39:15 +0000},
      Doi = {10.1109/CVPR.2013.338},
      Month = {June},
      Organization = {IEEE Computer Society},
      Pdf = {http://www.cc.gatech.edu/~irfan/p/2013-Bettadapura-ABDDTSIAR.pdf},
      Title = {Augmenting Bag-of-Words: Data-Driven Discovery of Temporal and Structural Information for Activity Recognition},
      Url = {http://www.cc.gatech.edu/cpl/projects/abow/},
      Year = {2013},
      Bdsk-Url-1 = {http://www.cc.gatech.edu/cpl/projects/abow/},
      Bdsk-Url-2 = {http://dx.doi.org/10.1109/CVPR.2013.338}}

Abstract

We present data-driven techniques to augment Bag of Words (BoW) models, which allow for more robust modeling and recognition of complex long-term activities, especially when the structure and topology of the activities are not known a priori. Our approach specifically addresses the limitations of standard BoW approaches, which fail to represent the underlying temporal and causal information that is inherent in activity streams. In addition, we also propose the use of randomly sampled regular expressions to discover and encode patterns in activities. We demonstrate the effectiveness of our approach in experimental evaluations where we successfully recognize activities and detect anomalies in four complex datasets.

via IEEE Xplore – Augmenting Bag-of-Words: Data-Driven Discovery of Temporal and Structural Information for Activity R….

AddThis Social Bookmark Button

Paper in AISTATS 2013 “Beyond Sentiment: The Manifold of Human Emotions”

April 29th, 2013 Irfan Essa Posted in AAAI/IJCAI/UAI, Behavioral Imaging, Computational Journalism, Numerical Machine Learning, Papers, WWW No Comments »

  • S. Kim, F. Li, G. Lebanon, and I. A. Essa (2013), “Beyond Sentiment: The Manifold of Human Emotions,” in Proceedings of AI STATS, 2013. [PDF] [BIBTEX]
    @inproceedings{2012-Kim-BSMHE,
      Author = {Seungyeon Kim and Fuxin Li and Guy Lebanon and Irfan A. Essa},
      Booktitle = {Proceedings of AI STATS},
      Date-Added = {2013-06-25 12:01:11 +0000},
      Date-Modified = {2013-06-25 12:02:53 +0000},
      Pdf = {http://arxiv.org/pdf/1202.1568v1},
      Title = {Beyond Sentiment: The Manifold of Human Emotions},
      Year = {2013}}

Abstract

Sentiment analysis predicts the presence of positive or negative emotions in a text document. In this paper we consider higher dimensional extensions of the sentiment concept, which represent a richer set of human emotions. Our approach goes beyond previous work in that our model contains a continuous manifold rather than a finite set of human emotions. We investigate the resulting model, compare it to psychological observations, and explore its predictive capabilities. Besides obtaining significant improvements over a baseline without manifold, we are also able to visualize different notions of positive sentiment in different domains.

via [arXiv.org 1202.1568] Beyond Sentiment: The Manifold of Human Emotions.

AddThis Social Bookmark Button

Paper in ICCP 2013 “Post-processing approach for radiometric self-calibration of video”

April 19th, 2013 Irfan Essa Posted in Computational Photography and Video, ICCP, Matthias Grundmann, Papers, Sing Bing Kang No Comments »

  • M. Grundmann, C. McClanahan, S. B. Kang, and I. Essa (2013), “Post-processing Approach for Radiometric Self-Calibration of Video,” in Proceedings of IEEE International Conference on Computational Photography, 2013. [PDF] [WEBSITE] [VIDEO] [DOI] [BIBTEX]
    @inproceedings{2013-Grundmann-PARSV,
      Author = {Matthias Grundmann and Chris McClanahan and Sing Bing Kang and Irfan Essa},
      Booktitle = {Proceedings of IEEE International Conference on Computational Photography},
      Date-Added = {2013-06-25 11:54:57 +0000},
      Date-Modified = {2013-10-22 18:41:09 +0000},
      Doi = {10.1109/ICCPhot.2013.6528307},
      Month = {April},
      Organization = {IEEE Computer Society},
      Pdf = {http://www.cc.gatech.edu/~irfan/p/2013-Grundmann-PARSV.pdf},
      Title = {Post-processing Approach for Radiometric Self-Calibration of Video},
      Url = {http://www.cc.gatech.edu/cpl/projects/radiometric},
      Video = {http://www.youtube.com/watch?v=sC942ZB4WuM},
      Year = {2013},
      Bdsk-Url-1 = {http://www.cc.gatech.edu/cpl/projects/radiometric},
      Bdsk-Url-2 = {http://dx.doi.org/10.1109/ICCPhot.2013.6528307}}

Abstract

We present a novel data-driven technique for radiometric self-calibration of video from an unknown camera. Our approach self-calibrates radiometric variations in video, and is applied as a post-process; there is no need to access the camera, and in particular it is applicable to internet videos. This technique builds on empirical evidence that in video the camera response function (CRF) should be regarded time variant, as it changes with scene content and exposure, instead of relying on a single camera response function. We show that a time-varying mixture of responses produces better accuracy and consistently reduces the error in mapping intensity to irradiance when compared to a single response model. Furthermore, our mixture model counteracts the effects of possible nonlinear exposure-dependent intensity perturbations and white-balance changes caused by proprietary camera firmware. We further show how radiometrically calibrated video improves the performance of other video analysis algorithms, enabling a video segmentation algorithm to be invariant to exposure and gain variations over the sequence. We validate our data-driven technique on videos from a variety of cameras and demonstrate the generality of our approach by applying it to internet video.

via IEEE Xplore – Post-processing approach for radiometric self-calibration of video.

AddThis Social Bookmark Button

Paper in ECCV Workshop 2012: “Weakly Supervised Learning of Object Segmentations from Web-Scale Videos”

October 7th, 2012 Irfan Essa Posted in Activity Recognition, Awards, Google, Matthias Grundmann, Multimedia, PAMI/ICCV/CVPR/ECCV, Papers, Vivek Kwatra, WWW No Comments »

Weakly Supervised Learning of Object Segmentations from Web-Scale Videos

  • G. Hartmann, M. Grundmann, J. Hoffman, D. Tsai, V. Kwatra, O. Madani, S. Vijayanarasimhan, I. Essa, J. Rehg, and R. Sukthankar (2012), “Weakly Supervised Learning of Object Segmentations from Web-Scale Videos,” in Proceedings of ECCV 2012 Workshop on Web-scale Vision and Social Media, 2012. [PDF] [DOI] [BIBTEX]
    @inproceedings{2012-Hartmann-WSLOSFWV,
      Author = {Glenn Hartmann and Matthias Grundmann and Judy Hoffman and David Tsai and Vivek Kwatra and Omid Madani and Sudheendra Vijayanarasimhan and Irfan Essa and James Rehg and Rahul Sukthankar},
      Booktitle = {Proceedings of ECCV 2012 Workshop on Web-scale Vision and Social Media},
      Date-Added = {2012-10-23 15:03:18 +0000},
      Date-Modified = {2013-10-22 18:57:10 +0000},
      Doi = {10.1007/978-3-642-33863-2_20},
      Pdf = {http://www.cs.cmu.edu/~rahuls/pub/eccv2012wk-cp-rahuls.pdf},
      Title = {Weakly Supervised Learning of Object Segmentations from Web-Scale Videos},
      Year = {2012},
      Bdsk-Url-1 = {http://dx.doi.org/10.1007/978-3-642-33863-2_20}}

Abstract

We propose to learn pixel-level segmentations of objects from weakly labeled (tagged) internet videos. Speci cally, given a large collection of raw YouTube content, along with potentially noisy tags, our goal is to automatically generate spatiotemporal masks for each object, such as dog”, without employing any pre-trained object detectors. We formulate this problem as learning weakly supervised classi ers for a set of independent spatio-temporal segments. The object seeds obtained using segment-level classi ers are further re ned using graphcuts to generate high-precision object masks. Our results, obtained by training on a dataset of 20,000 YouTube videos weakly tagged into 15 classes, demonstrate automatic extraction of pixel-level object masks. Evaluated against a ground-truthed subset of 50,000 frames with pixel-level annotations, we con rm that our proposed methods can learn good object masks just by watching YouTube.

Presented at: ECCV 2012 Workshop on Web-scale Vision and Social Media, 2012, October 7-12, 2012, in Florence, ITALY.

Awarded the BEST PAPER AWARD!

 

AddThis Social Bookmark Button

AT UBICOMP 2012 Conference, in Pittsburgh, PA, September 5 – 7, 2012

September 4th, 2012 Irfan Essa Posted in Edison Thomaz, Grant Schindler, Gregory Abowd, Papers, Presentations, Thomas Ploetz, Ubiquitous Computing, Vinay Bettadapura No Comments »

At ACM sponsored, 14th International Conference on Ubiquitous Computing (Ubicomp 2012), Pittsburgh, PA, September 5 – 7, 2012.

Here are the highlights of my group’s participation in Ubicomp 2012.

  • E. Thomaz, V. Bettadapura, G. Reyes, M. Sandesh, G. Schindler, T. Ploetz, G. D. Abowd, and I. Essa (2012), “Recognizing Water-Based Activities in the Home Through Infrastructure-Mediated Sensing,” in Proceedings of ACM International Conference on Ubiquitous Computing (UBICOMP), 2012. [PDF] [WEBSITE] (Oral Presentation at 2pm on Wednesday September 5, 2012).
  • J. Wang, G. Schindler, and I. Essa (2012), “Orientation Aware Scene Understanding for Mobile Camera,” in Proceedings of ACM International Conference on Ubiquitous Computing (UBICOMP), 2012. [PDF[WEBSITE] (Oral Presentation at 2pm on Thursday September 6, 2012).

In addition, my colleague, Gregory Abowd has a position paper on “What next, Ubicomp? Celebrating an intellectual disappearing act” on Wednesday 11:15am session and my other colleague/collaborator Thomas Ploetz has a paper on “Automatic Assessment of Problem Behavior in Individuals with Developmental Disabilities” with his co-authors Nils Hammerla, Agata Rozga, Andrea Reavis, Nathan Call, Gregory Abowd on Friday September 6, in the 9:15am session.

AddThis Social Bookmark Button

Paper in IEEE CVPR 2012: “Detecting Regions of Interest in Dynamic Scenes with Camera Motions”

June 16th, 2012 Irfan Essa Posted in Activity Recognition, Kihwan Kim, Numerical Machine Learning, PAMI/ICCV/CVPR/ECCV, Papers, PERSEAS, Visual Surviellance No Comments »

Detecting Regions of Interest in Dynamic Scenes with Camera Motions

  • K. Kim, D. Lee, and I. Essa (2012), “Detecting Regions of Interest in Dynamic Scenes with Camera Motions,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012. [PDF] [WEBSITE] [VIDEO] [DOI] [BLOG] [BIBTEX]
    @inproceedings{2012-Kim-DRIDSWCM,
      Author = {Kihwan Kim and Dongreyol Lee and Irfan Essa},
      Blog = {http://prof.irfanessa.com/2012/04/09/paper-cvpr2012/},
      Booktitle = {Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
      Date-Added = {2012-04-09 22:37:06 +0000},
      Date-Modified = {2013-10-22 18:53:11 +0000},
      Doi = {10.1109/CVPR.2012.6247809},
      Pdf = {http://www.cc.gatech.edu/~irfan/p/2012-Kim-DRIDSWCM.pdf},
      Publisher = {IEEE Computer Society},
      Title = {Detecting Regions of Interest in Dynamic Scenes with Camera Motions},
      Url = {http://www.cc.gatech.edu/cpl/projects/roi/},
      Video = {http://www.youtube.com/watch?v=19BMwDMCSp8},
      Year = {2012},
      Bdsk-Url-1 = {http://www.cc.gatech.edu/cpl/projects/roi/},
      Bdsk-Url-2 = {http://dx.doi.org/10.1109/CVPR.2012.6247809}}

Abstract

We present a method to detect the regions of interests in moving camera views of dynamic scenes with multiple mov- ing objects. We start by extracting a global motion tendency that reflects the scene context by tracking movements of objects in the scene. We then use Gaussian process regression to represent the extracted motion tendency as a stochastic vector field. The generated stochastic field is robust to noise and can handle a video from an uncalibrated moving camera. We use the stochastic field for predicting important future regions of interest as the scene evolves dynamically.

We evaluate our approach on a variety of videos of team sports and compare the detected regions of interest to the camera motion generated by actual camera operators. Our experimental results demonstrate that our approach is computationally efficient, and provides better prediction than those of previously proposed RBF-based approaches.

Presented at: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2012, Providence, RI, June 16-21, 2012

AddThis Social Bookmark Button