MS2SL: Multimodal Spoken Data-Driven Continuous Sign Language Production
While current sign language translation technology has made significant strides, there is still no viable solution for generating sign sequences directly from spoken content, e.g., text or speech. In this paper, we propose a unified framework for continuous sign language production toease communication between sign and non-sign language users. The framework can capably convert multimodal spoken data (speech or text) into continuous sign keypoint sequences. In particular, a sequence diffusion model is crafted to step-by-step generate sign predictions, employing text or speech audio embeddings extracted by pretrained models like CLIP and HuBert. Moreover, by formulating a joint embedding space for text, audio, and sign, we bind data from the three modalities and leverage the semantic consistency across modalities to provide informative feedback signals for the training of diffusion model. This embedding-consistency learning strategy minimizes the reliance on triplet sign language data and ensures continuous model refinement, even with a missing audio modality. Experiments on How2Sign and PHOENIX14T datasets demonstrate that our model achieves competitive performance in producing signs from both speech and text data.
How2Sign
Text-to-Sign
Text: Let me demonstrate you this on my back because it’s a lot easier.
Text: Right now, winter ties are probably the more popular way to go.
Text: I have got some leather mittens here.
Text: I have got some leather mittens here.
Audio-to-Sign
Text: And I’m actually going to lock my wrists when I pike.
Text: The rudder is the vertical stabilizer.
Text: There’s the orange portal that we came out of and that’s this test chamber.
Text: So, we’ve got to find a way to get to the exit.
Citation
Please consider citing our paper if it helps your research.
@inproceedings{ma2024ms2sl,
title={MS2SL: Multimodal Spoken Data-Driven Continuous Sign Language Production},
author={Ma, Jian and Wang, Wenguan and Yang, Yi and Zheng, Feng},
booktitle={ACL},
year={2024}
}