N. Diakopoulos, K. Luther, I. Essa (2008), “Audio Puzzler: Piecing Together Time-Stamped Speech Transcripts with a Puzzle Game.” In Proceedings of  ACM International Conference on Multimedia 2008. Vancouver, BC, CANANDA  [Project Link]


We have developed an audio-based casual puzzle game which produces a time-stamped transcription of spokenapaudio as a by-product of play. Our evaluation of the game indicates that it is both fun and challenging. The transcripts generated using the game are more accurate than those produced using a standard automatic transcription system and the time-stamps of words are within several hundred milliseconds of ground truth.

Paper in ACM Multimedia (2006): “Interactive mosaic generation for video navigation”

K. Kim, I. Essa, and G. Abowd (2006) “Interactive mosaic generation for video navigation.” in Proceedings of the 14th annual ACM international conference on Multimedia, pages 655-658, 2006. [Project Page | DOI | PDF]


Navigation through large multimedia collections that include videos and images still remains cumbersome. In this paper, we introduce a novel method to visualize and navigate through the collection by creating a mosaic image that visually represents the compilation. This image is generated by a labeling-based layout algorithm using various sizes of sample tile images from the collection. Each tile represents both the photographs and video files representing scenes selected by matching algorithms. This generated mosaic image provides a new way for thematic video and visually summarizes the videos. Users can generate these mosaics with some predefined themes and layouts, or base it on the results of their queries. Our approach supports automatic generation of these layouts by using meta-information such as color, time-line and existence of faces or manually generated annotated information from existing systems (e.g., the Family Video Archive).

