City
Epaper

Meta to train speech recognition engines on 'clusters' of speakers using new dataset

By IANS | Updated: July 14, 2023 13:15 IST

San Francisco, July 14 Meta (formerly Facebook) has developed a new dataset which the company will use to ...

Open in App

San Francisco, July 14 Meta (formerly Facebook) has developed a new dataset which the company will use to improve the performance of automatic speech recognition (ASR) tools by clustering speech at the "utterance level".

As part of Meta's continued commitment to improving ASR performance, the company has taught ASRs to train without transcripts, recognise over 4,000 spoken languages, and even read lips more accurately than humans.

However, many of the datasets used to train ASR models are organised by demographic such as age group, gender, nationality, and English accent, which limits the variation of pronunciations that models are trained on, ultimately hampering their function in understanding a wide range of users.

In order to overcome this, Meta AI developed a dataset that relies instead on utterance clustering.

"Instead of dividing a dataset based on speakers' demographic information -- such as their age group or gender -- our proposed algorithm clusters speech at the utterance level," Meta said in a blogpost on Thursday.

"A single cluster will contain similar utterances from a diverse group of speakers. We can then train our model using the various clusters and use fairness datasets to measure how the model impacts outcomes across different demographic groups," it added.

The company's resulting dataset includes about 27,055 utterances in a recorded speech by 595 people in the US who were paid to record and submit audio of themselves saying commands.

Their utterances are organised around seven main themes -- music, capture, utilities, notification control, messaging, calling, and dictation, which other researchers can use to train their own models and digital assistants on.

The speakers were asked how they would voice search for a song, make plans with friends, and decide where to meet up.

To evaluate this new system, Meta trained their model on de-identified, publicly available Facebook videos in English which were evaluated on two datasets.

The first was a de-identified dataset collected from a data supplier for ASR that includes 48,000 utterances from 867 speakers, and the second dataset is Casual Conversations v1, a dataset of transcribed speech that Meta built and made publicly available in 2021.

"During testing, we observed that a model trained in this manner improved speech recognition accuracy for all measured demographic groups, and in particular for different accents, which are identified in sociolinguistics as a way of pronouncing a language that is distinctive to a country, area, social class, or individual," Meta said.

"While our proposed algorithm was built using English-language data, we hope these approaches can be extended to work for other languages as well," it added.

Disclaimer: This post has been auto-published from an agency feed without any modifications to the text and has not been reviewed by an editor

Tags: congresspitrodadelhimodideepikabjpwest-bengaldeepika-padukoneajay-devgnthakur
Open in App

Related Stories

EntertainmentIt Has Been ‘Main Apni Favourite Hoon, Hamesha" Says Rakul Preet Singh

NationalAtishi Slams Delhi CM Rekha Gupta Over Citywide Power Outages Amid Scorching Heat (Watch Video)

NationalGold Price on April 22: Rate for 10 Grams of Yellow Metal Hits All-Time High Ahead of Akshaya Tritiya

Navi MumbaiNavi Mumbai: Eknath Shinde’s Strategy to Give Strength to BJP; 12 Former Corporators Likely To Join Shiv Sena

NationalDelhi MCD Elections 2025: AAP Not to Contest Mayor Polls Scheduled for April 25

Technology Realted Stories

TechnologyIndia’s moment in science and innovation has arrived: Minister

TechnologyMid-Market GCCs set to drive next phase of growth in India: Report

TechnologyAndhra college student hits teacher with slipper for confiscating phone

TechnologyHCL Tech Q4 profit falls 6.2 pc QoQ to Rs 4,307 crore, revenue rises

Technology9 Indian banks figure in list of 40 Global Digital Champions: Report