Silent Speech Interfaces for all
Multi-speaker speech recognition from ultrasound images of the tongue and video images of the lips
Silent speech interfaces perform speech recognition and synthesis from articulatory data in order to restore spoken communication for users with voice impairments (for example, after laryngectomy) or to allow silent communication in situations where audible speech is undesirable. Much of the previous work in this area has focused on models learned on data from single speakers (called speaker-dependent models), which do not generalize to unknown speakers. This project aimed to investigate a speaker-independent silent speech interface for continuous speech recognition from ultrasound images of the tongue and video images of the lips. This interface was benchmarked against a system trained on high-quality data from a single speaker (speaker-dependent model).
To achieve this, we prepared the Tongue and Lips (TaL) corpus, a multi-speaker corpus of synchronised audio, ultrasound tongue imaging, and lip videos. TaL1 is a set of six recording sessions of oneprofessional voice talent, a male native speaker of English; TaL80 is a set of recording sessions of 81 native speakers of English with-out voice talent experience. Overall, the corpus contains 24 hours ofparallel ultrasound, video, and audio data, of which approximately 13.5 hours are speech. See samples of the TaL corpus here: TaL1 and TaL80. For more details, including how to access the data, check the documentation for the Ultrasuite Repository.
Funded by the Carnegie Trust for the Universities of Scotland Research Incentive Grant – grant number RIG008585 (“Silent speech interfaces for all – recognising speech from ultrasound images of the tongue”). 01/08/2019 → 01/12/2020.
- Using Ultrasound to Identify Inconsistency in Children’s Speech. Tom Starr-Marshall. School of Psychological Sciences and Health, University of Strathclyde, ongoing.
- Voice Reconstruction from Tongue and Lip Articulation with Transfer Learning from Text-to-Speech Synthesis. Jing-Xuan Zhang. University of Science and Technology of China, P.R.China. Work done while a visiting student in the School of Informatics, The University of Edinburgh, 2020. [Paper (soon)][Video Sample 1][Video Sample 2].
- Speech, Gross, and Fine Motor Skills in Children with Autism. Louise McKeever. School of Psychological Sciences and Health, University of Strathclyde, 2020
- Robust Word Alignment of Child Speech Therapy Sessions using Data Augmentation on Audio and Ultrasound Tongue Imaging. Wang Yau Li. MSc dissertation. School of Philosophy, Psychology & Language Sciences. The University of Edinburgh. 2019.
- Weakly-Supervised Keyword Recognition Applied to Child Speech Therapy Data. Carlos Mocholí Calvo. MSc dissertation. School of Informatics, The University of Edinburgh, 2019.
- Ultrasound-based Audio-Visual Speech Recognition for Children with Speech Disorders. Alexandra Antonides. MSc dissertation. School of Philosophy, Psychology & Language Sciences. The University of Edinburgh. 2018.
- Phone recognition from Ultrasound Data in Child Speech. Jie Chi. MSc dissertation. School of Informatics, The University of Edinburgh, 2018.
Using Ultrasound Visual Biofeedback to Diagnose and Treat Speech Disorders in Children with Cleft Lip and Palate
Children with cleft lip and palate (CLP) often continue to have problems producing clear speech long after the clefts have been surgically repaired, leading to educational and social disadvantage. For these children, speech and language therapy is required to diagnose the specific type of speech disorder present and provide intervention. Currently, this is undertaken by listening to speech and transcribing it. This is problematic because clinicians disagree on transcriptions and using listening methods only can miss imperceptible but important errors (e.g., instead of producing “t” with the tip of the tongue, producing it incorrectly with both the tip and back of the tongue). This project developed a technical solution for improving diagnosis of speech disorders in children with CLP. Ultrasound Visual Biofeedback (U-VBF) uses ultrasound placed under the chin to provide a real-time view of the tongue during speech. By recording and analysing these movements during speech we observe errors which are impossible to hear, leading to better diagnosis and treatment planning.
Funded by Action Medical Research, 17/04/17 → 31/07/18
This project showed that it is possible to use ultrasound visual biofeedback to remediate previously intractable SSDs.
Funded by the Chief Scientist Office, 01/01/15 → 30/06/2016