Sponsored by Emvoice

Emvoice One: Could This Vocal Creation Plug-In Close the Gap Between Songwriting and DAW-Based Production? We Take a Peek Under the Hood!

How many times have you been working on a track and wished you could try out a few new toplines? Perhaps you need a quick female voice for that pop demo you’re sending out tonight. Or, maybe you want to record some backing vocals but don’t have the time or money to hire real singers.

These days, it’s possible to synthesize just about any instrument with pretty convincing results. But when it comes to the human voice, the sound can be a little less than lifelike. The Emvoice One plug-in aims to change that, giving users the power to create instant, realistic vocals in their productions, no recording needed.

Emvoice’s makers say the plug-in brings new potential to creating and editing vocals in DAWs; I sat down with the team, and the software, to find out more.

A Different Kind of Vocal Synth
Many vocal synths apply complex modeling to generate a realistic sound—which can have uneven results. Emvoice takes a different approach, using a dictionary to break down text into granular language building blocks called phonemes, then combining existing recordings of those phonemes from one of their voice libraries in a cloud-based engine to reconstruct a new voice recording. That means when you hear Emvoice, you’re not listening to a modeled voice, you’re listening to a real singer interpreting your lyrics. Users simply enter notes and text into the Emvoice One plug-in, and the cloud-based engine takes it from there. (Voices are available for purchase through the Emvoice One plug-in.)

By taking advantage of these three elements—the plug-in, the engine, and the dictionary—Emvoice users can create voices by simply entering notes and typing words.

And, that’s just the beginning: Users can fine-tune words and sounds by manually editing phonemes, and create vibratos, glissandos, legatos and nuanced phrasing via simple piano roll editing.

Synthesis, Samples...and Scripts
Over the past few decades, synthesis technologies have evolved to the point where they can model or re-create almost any instrument with stunning realism. But human vocals have been the last frontier, so to speak.

“If you're talking about drums, if you're talking about pianos, if you're talking about most instruments, you can pretty much recreate them identically to a studio recording,” says Emvoice CEO Jacob McCloskey. “Human voices have lagged behind that. I think the reason is because the voice is so deeply ingrained in the human experience. Our sensitivity towards realism is justifiably high.”

“The technical difficulty of creating this instrument, if I can call our plug-in an instrument, is that the voice, its overtones, change as you speak or sing,” explains Emvoice Chief Technology Officer David Papurt. “Not that a violin is a simple instrument, but from the point of view of analyzing waveforms, a violin is far simpler than the human voice, and I think that was a big part of the roadblock for many, many years.”

Not that that hasn’t stopped tech innovators from trying. But the key to realism, says Papurt, is to sample vocals using the right script. “It took someone with a very keen sense of hearing, and also with a knowledge of phonetics,” he says, referring to the vision of Emvoice’s founder, veteran artist and producer Rodolphe Ollivier.

Like Singers on Standby
Emvoice doesn’t strive for a "signature" sound, but rather, to offer a palette of realistic voices to use adaptively in productions. For songwriters and producers who aren’t great singers or don’t have access to singers, it’s a game changer: “If you're a producer, you can only get so far without a voice,” says McCloskey. “Arming people with the creativity that comes with access to vocals is the thing that's so exciting for me about this.”

McCloskey says the Emvoice user base is mostly made up of non-vocalist producers, writers who want to draft toplines away from their recording setup, and artists who use the plug-in for “totally off-the-wall purposes.” Common applications include creating demo vocals and reference tracks for background singers, and building backing vocal tracks that remain in the final mix.

“A big application is when people are coming up with lyrics and they're not sure they want to commit to them yet,” he explains. “They might change a few syllables or a few words, but they want to get the melodic idea down.”

Many customers use Emvoice voices as their main vocalists: “This type of use case is growing a lot lately, and I think the trend will continue as more voices release and the voices continue to improve,” says McCloskey.

Emvoice in Action
Emvoice One is a plug-in, available in VST2, VST3, Audio Units, and AAX formats. It will run in just about any DAW, including Logic Pro, Pro Tools, Ableton Live, Studio One, Bitwig, Audacity, and Cubase. The plug-in and vocal engine are free, and voices are currently available for $79 each. Current choices include “Lucy,” a female voice; “Jay,” a male voice; and “Thomas,” a classic vocoder. Emvoice’s fourth voice, a pop female vocalist, is set to release in mid-June.

To create singing, users input notes and words, which are translated into phonemes that are understood by the engine. Then, the engine draws from thousands of recorded vocal samples to generate sound.

Each voice is contained in a library. Libraries are structured so that all of the necessary diphones—combinations of adjacent phonemes—are covered, thanks to elaborate scripts that singers perform in the studio. “The words that these singers have to sing require a lot of enunciation,” said McCloskey.

Vocal sessions are meticulous work: “The singer has to sing the entire script on one pitch and on the right tempo in order for things to fit, and then the singer does that every two semitones, ” Papurt explains: "This is done to capture the variability of the human voice and the characteristics of each pitch. Our engine then synthesizes the pitch changes and resamples any notes out of the singer's natural range, such as very high notes, or very low notes.”

“A choice that I really think Rodolphe nailed was preserving the raw and unprocessed recording quality,” says McCloskey. We want the voices to be adaptable in all the same ways that a studio a cappella would be. After all, the voices, well, are studio a cappellas."

Thousands of raw, unprocessed sound files are indexed, called up, and combined by a sophisticated cloud-based engine that returns the complete vocal to a user’s system over the internet. Every time a request for audio arrives, the correct sound files are located and bits are pulled out and spliced together for the subsequent sound. Emvoice voices are royalty-free and can be used on any commercial projects, even in free demo mode.

Because of the sheer volume of content, Emvoice works in the cloud and requires internet access. Papurt is quick to reassure anyone concerned about privacy or stability. “We're simply transmitting audio files with a little extra information about how to synchronize them,” he explains. “We keep some statistical information on usage for R&D purposes, but as far as the actual audio content is concerned, once it's sent back to the user, it’s discarded by our server.”

McCloskey adds that keeping the vocal engine in the cloud gives it more flexibility to evolve. “For example, in the most recent update we added 10 presets for each voice, including hard tuning; smooth legato; doubling options; and Alternate Take, which accesses different samples from the original recording session, effectively creating an entirely new version of the voice.” Other recent enhancements include a MIDI Listen mode, extended voice ranges, and a redesigned interface with light and dark themes.

Emvoice Preset Menu

I took the Emvoice One plug-in for a spin in Logic Pro. Installation was incredibly simple: Registration is via email confirmation, password not required. The plug-in can be authorized on up to three computers. (Each voice in a mix is a unique instance of the plug-in, limited only by a host system’s capabilities.)

I was struck by how easy it was to get started with Emvoice, yet how much granular control I had over nuances of performance. The plug-in features a piano roll interface with simple pen and text-box input tools, along with other basic controls such as quantization settings. Creating a vocal line was as simple as drawing notes, typing lyrics into their associated text boxes, and hitting Play. In less than a minute, I had my first verse in the can. This is where the fun really began.

In Emvoice One, each note is equivalent to one syllable; adjust note lengths by dragging note edges. To create phrases, simply drag notes together to group them with a single text box. Create glissandos or vibrato by sliding notes or note fragments between semitones for fine pitch control. Emvoice even generates realistic breath sounds around programmed notes. Features like pitch- and formant-shifting currently take the form of voice presets.

Custom Pronunciations (in Emvoice’s Light Mode)

It’s easy to stylize sounds to make them sound as human as possible in any context. Perhaps “Ah-ee love you” would sound more natural than “I love you” on your track. Many words offer alternate pronunciations (by right-clicking); you can also craft your own words, accents, or custom pronunciations by stringing together phonemes and saving them in a custom dictionary. (Download Emvoice’s cheat sheet for tons of tips.) Emvoice can import and export project files of lyrics and notes.

Ultimately, with a little tinkering I was able to create a realistic-sounding vocal track with human-sounding articulation and expression. And with features like region split, note split, and note join in the pipeline, this process will only get more streamlined.

Emvoice’s new Note Split tool

The Emvoice Endgame
So, where is this technology headed? “Human voices are all unique,” says Papurt. “You rarely confuse the sound of one person's voice for another, and that gives us this great opportunity for having a very large palette of voices.”

“We want to make the user the core of the Emvoice experience, and adaptability is a big part of that. This is why the voices are designed to sound like a raw/unprocessed a cappella recording: It’s more exciting for a user to be able to adapt our voices to their style than it would be for a user to adapt their style to our voices,” adds McCloskey. “We’re currently focusing a lot on improving workflow features within the plug-in. Inspiration comes and goes, often musical ideas are only in a creator’s head for a moment before disappearing forever. If we can make it as simple to capture an idea through Emvoice as it’d be to capture an idea in something like Apple’s Voice Memo app, hopefully we can help artists tap into more creativity.

Emvoice just launched a note split tool. Over the course of the next few months, the company will also be releasing region split features, a note join tool, and a new female pop voice.

Emvoice one is available as a free eight-note demo version, with full access to the Emvoice voice engine. Lucy, Jay, and Thomas voices are $79 each.

For more information, including video tutorials, visit www.emvoiceapp.com.