AERO: AI-Enhancements Responsive to Orality
By Jonathan Hudlow, Joshua Nemeck, and Cassie Weishaupt
2024.10
Date:
Original email campaign link:
AERO: AI-Enhancements Responsive to Orality
By Jonathan Hudlow, Joshua Nemeck, and Cassie Weishaupt
Imagine you’ve just wrapped up an audio translation project, recording the book of Luke into a North African language. And you are feeling relieved after months of hard work. But during the review phase, a few glaring issues emerge: background noise (a rooster crowing, footsteps, the occasional honking of car horns) which compromises the quality of the recording. One of the voice actors, who had previously agreed to participate, now expresses discomfort with their voice being used, citing personal reasons. On top of that, a key term in the translation was mispronounced throughout the entire recording, requiring a correction in multiple places. The thought of re-recording everything feels overwhelming.
It’s not uncommon for an Oral Bible Translation (OBT) project to encounter one or more of these challenges, and the hypothetical scenario described above encapsulates some of the most common real-world issues that OBT teams have shared with us, the AI Capabilities Team. Through our partnership with OBT teams, we’ve begun exploring ways to leverage advances in artificial intelligence to help address these time- and resource-consuming problems, which has resulted in a new suite of tools called AERO (AI-Enhancements Responsive to Orality). AERO currently provides Noise Removal, Voice Conversion, and Transcription via a user interface and API. There are additional features, including audio infilling, which are actively under development.
Noise Removal: Getting Rid of Distracting Background Sounds
One of the most common challenges faced in Oral Bible Translation (OBT) and other audio projects is background noise. Oral language communities, especially those working in less-than-ideal recording environments, often struggle with unwanted sounds that can degrade the quality of the final audio.
Sometimes the “recording studio” consists of a hotel room with towels jammed into the cracks under the door. In other cases, the inside of a car with the windows rolled up is the best one can do in creating a recording environment. We’ve heard of team members alternating between running their car engines with air conditioning blasting for several minutes and turning everything off for several minutes of recording time to provide quiet enough conditions and some escape from sweltering heat. Removing unwanted background noise from an audio file in AERO is now as straightforward as uploading the file and clicking the “de-noise” button.
Voice Conversion: A New Voice Without Re-Recording
Another common challenge in OBT projects is the need for voice changes. Sometimes this occurs when a translation team is made up of all female or all male staff, and they have a part that needs to be recorded for the other gender. In other cases, prospective voice actors may be unwilling to participate in a Bible project due to fear of religious persecution. In some parts of the world, being associated with such a project can even get someone killed.
Whatever the reason, AERO provides an innovative approach to these challenges through its Voice Conversion feature. This tool allows you to modify a speaker’s voice while maintaining the clarity of the language. For example, you can change the gender or pitch of the voice or completely disguise the speaker’s identity without sacrificing intelligibility. It works for every language we’ve tested so far, and ensures that content remains intact while offering the flexibility to anonymize or adjust voices as needed.
Transcription: Phonetic and Sister-Language-Orthography Speech-to-Text
The third service currently offered through AERO is Transcription. It can be used to transcribe into a “sister language” (a language that shares the same orthography, or, in the case of languages without an established orthography, one that the language community is familiar with) using either the primary script of the sister language or the Latin script. The results also include phonetic transcription, which has been identified as useful for teams working on developing their own orthography, and which provides the foundation for the audio infilling feature that is being developed. This feature will eventually allow you to “find and replace” an audio segment throughout a project, blending the new clip seamlessly into the original recording. While this feature is being built, the Transcription tool in AERO is already being used to help jumpstart transcription which was, until recently, a fully manual process. Recent feedback from a user in the field noted that with AERO “experienced transcribers have an assistive tool that reduces their workload by 70%.”
Empowering Language Communities
Translation teams can use AERO to help solve real-world challenges, enhance the quality of their projects, and free up their valuable time for other tasks. We’re currently working on the following three things: Integrating AERO into SIL’s Audio Project Manager (APM), making it available to other platforms commonly used for OBT, and exploring offline versions to further extend its potential benefits to communities that need it most. You can sign up for a free trial at aero.multilingualai.com. We welcome your feedback and partnership in further improving AERO.
Jonathan Hudlow (California), Joshua Nemeck (Michigan), and Cassie Weishaupt (Massachusetts) are data scientists based in the United States with the SIL AI Capabilities Team.
AERO AI Tools: A Quick Review
By John Gieske
I took some time this week to very briefly review each of the new AI tools on AERO. Below is a summary of what I found. Please feel free to contact me for more details or questions. And better yet, try it out yourself!
AERO Voice Conversion
I used four different recordings in local languages and told the tool to change those voices to my voice. The results were impressive, though they didn’t sound like me and they weren’t exactly studio quality.
PROS :
It creates a voice that is believable and quite clear. A local translator thought that it really was somebody from his people group.
The voice was fairly consistent between the four conversions, even though they were in different languages.
In one experiment it even successfully converted from a female voice.
CONS :
The voice sounded nothing like me (others confirmed this)
The conversions have some artifacts that sound like what you would get from saving a low-quality MP3.
There were a couple of places where it struggled with certain sounds. Having a native speaker listen to the entire production from beginning to end would be especially critical if you used this tool.
Read the rest of John's review here...
John Gieske is a Vernacular Media Consultant who lives in Senegal. He has a B.A. in global mass communications and a decade of experience on the field, especially in audio recording and production.

