Pei Yin, Thad Starner, Harley Hamilton, Irfan Essa, James M. Rehg (2009), “Learning Basic Units in American Sign Language using Discriminative Segmental Feature Selection” in IEEE Conference on Acoustics, Speech, and Signal Processing 2009 (ICASSP 2009). Session: Spoken Language Understanding I, Tuesday, April 21, 11:00 – 13:00, Taipei, Taiwan.
The natural language for most deaf signers in the United States is American Sign Language (ASL). ASL has internal structure like spoken languages, and ASL linguists have introduced several phonemic models. The study of ASL phonemes is not only interesting to linguists, but also useful for scalability in recognition by machines. Since machine perception is different than human perception, this paper learns the basic units for ASL directly from data. Comparing with previous studies, our approach computes a set of data-driven units (fenemes) discriminatively from the results of segmental feature selection. The learning iterates the following two steps: first apply discriminative feature selection segmentally to the signs, and then tie the most similar temporal segments to re-train. Intuitively, the sign parts indistinguishable to machines are merged to form basic units, which we call ASL fenemes. Experiments on publicly available ASL recognition data show that the extracted data-driven fenemes are meaningful, and recognition using those fenemes achieves improved accuracy at reduced model complexity