AI and ASR for the inclusion of remote, virtual and hybrid work |  Pipeline magazine

AI and ASR for the inclusion of remote, virtual and hybrid work | Pipeline magazine

From: Alex Kozlov

Communication Access Real-Time Translation (CART) services transcribe words into text or subtitles as they are spoken during a classroom lecture, business meeting, business, a public speech, a sporting event or an artistic performance. For people who are deaf or hard of hearing, who are not fluent in English or who have auditory processing disabilities, CART services are essential to enable basic understanding and participation. Additionally, by providing written documentation of an event in real time, CART improves accuracy and retention of information.

Traditionally, CART services have been provided by highly trained stenographers. In addition to having to learn to type at up to 260 words per minute, a CART transcriptionist often needs additional specialized training in fields like law or medicine to accurately capture obscure terminology. Because these skills are rare and in demand, CART services tend to be expensive and hard to find.

Today, advances in artificial intelligence (AI) and speech recognition are redefining existing approaches to real-time written translation. By combining human skills with intelligent software, these innovations offer the potential to dramatically expand the availability of CART services. Easier access to CART services, meanwhile, promises to create new opportunities to improve communication and strengthen the inclusion of people with unique learning styles.

Automatic Speech Recognition (ASR) computer software recognizes speech on two fronts. Acoustically, an ASR application “listens” to the sound of a spoken word and produces text corresponding to that sound. Although a relatively simple task, translating sounds into written words poses a variety of challenges. These include understanding accents, jargon, and vocal inflections, as well as filtering out background noise. In recent years, significant progress has been made to improve ASR’s ability to understand accents, reduce the need to train an application to recognize an individual’s voice, and reduce environmental sensitivity.

In addition to recognizing sounds, ASR applications deploy natural language processing models to provide a contextual framework that analyzes the broader meaning of word combinations. This helps the program, among other things, determine proper spelling and usage. For example, in the statement “I like my steak medium-rare”, the words “medium-rare” provide context suggesting that the statement refers to “steak” as a food, rather than a “stake” in an organization, or a “stake” driven into the ground. At the same time, the program recognizes that the “stake in the ground” refers to a piece of metal rather than a slice of meat. Similarly, contextual analysis may determine that “to poke into the ground” is likely an idiomatic expression rather than a literal statement. Based on this determination, the program can more accurately predict the context of the rest of the discussion.

Today’s ASR tools are quite adept at transcribing audio recordings after the fact. Easily available and easily affordable tools allow a user to upload an audio file and receive an accurate transcription in as little as five to ten minutes. However, because the transcription is done offline, these tools have the luxury of analyzing the entire discussion before beginning the transcription. This backwards and forwards perspective allows the program to identify and examine the general context of the discussion and, therefore, provide a much more accurate result. A CART application, on the other hand, faces the much more difficult task of conducting contextual analysis on the fly. This means that the app must assess the context of each word and phrase as it is spoken, as well as predict the context of words before they are spoken.

To address the challenges of real-time speech recognition, researchers are developing end-to-end “transformer” models that apply deep learning techniques to streamline the task of contextualizing words, sentences, and paragraphs. Rather than dealing

#ASR #inclusion #remote #virtual #hybrid #work #Pipeline #magazine

Leave a Comment

Your email address will not be published. Required fields are marked *