The voice is a natural tool for communication. Many people want to resolve issues verbally rather than in writing, simply because it’s faster. In business communication with clients, it is a convenient and native way of interaction. But not every company can expand its call center staff in proportion to the growth rate of its customer base. Automation becomes an effective way to scale live communication with customers. It allows you to preserve the usual ways of communication and reach a greater number of contacts without sacrificing quality.
Voice technologies are used in many spheres, and they are suitable for all audiences: children are attracted to interactive “talkers”, young people appreciate voice control of smart devices, and an assistant reads the news to older people. But voice assistants are most in demand in industries where there is a lot of point-to-point communication with customers – in finance, retail, telecom.
“Voice technology is used in many areas.”
Major companies have been using voice technology for years. Bank of America has had its Erica virtual assistant “up and running” since 2017. Mercedes-Benz has been introducing a digital User Experience (MBUX) that “understands” voice commands since 2018. Retailer Walmart launched an app with the voice assistant Ask Sam, it helps shoppers with product searches. According to Adobe Analytics, 91% of brands are already investing heavily in voice solutions and plan to increase investment. The Russian voice-activated AI market will grow from 38% to 81% in the next five years and will reach $561 million in 2025, Just AI predicts.
Believe it or not.
Businesses measure the effectiveness of voice technology adoption by focusing on customer satisfaction and brand loyalty. But many customers view innovation with tempered enthusiasm. According to Voicebot.ai, only 45% of users want to see voice assistants in mobile apps. The main reasons for dislike, according to Neuro.net, are the poor quality of responses and synthetic speech of voice assistants. These problems are typical of interfaces built on past-generation technology. Modern machine-learning algorithms allow synthesis of voices devoid of soullessness.
Another limiting factor is that voice technology has proliferated in both “good” scenarios from the customer’s perspective, and in “bad” scenarios. There are not many companies on the market yet that specialize in developing voice interfaces, and the number of voices they can offer is limited. It turns out that if a person is bored with advertising or fraudulent calls today, and a helpful call comes in tomorrow – the communication will not be successful because “all robots for one voice”. If the reputation of the voice assistant is damaged, the effectiveness of useful calls to the client drops to zero. That’s why Brand Voice, the unique voice of the brand, is created.
“Unique voice is an important part of a brand, like a logo or a brand font. More and more of our clients are using this feature and have a dialogue with customers with unique voices. We record a set of phrases with a specific intonation in the voice of a company employee or announcer. And numerous dynamic data – phone numbers or addresses – the self-learning system generates automatically, reproducing the employee’s voice and retaining realistic intonations. This is how companies automate communications, but retain customer loyalty and increase conversion rates: people are pleased that they are being spoken to with a live voice, and they are eager to have a conversation.
Ivan Artemiev, MTT Product Director
Make the model talk
The cost of Brand Voice starts at 150 thousand rubles and depends on the area of application and the complexity of the voice synthesis model. The process of creating a solution consists of two parts – technical and logical, and a separate product team is responsible for each.
An important stage in this part is the selection of the voice on the basis of which speech will be synthesized. The voice should intonationally reflect those attributes of the brand, which it is important for the company to promote. A professional speaker or dubbing actor will need to speak on the record up to 40 hours of language structures. Recording should be of high quality, without unnecessary noise, and pronunciation – the correct, because this material will be trained model of the voice robot.
It takes from one month to six months to train the model and implement a full synthesis, depending on complexity. But technology is evolving, and recording time in the studio is gradually decreasing. It is possible that in the future it will be possible to get a good voice robot using only 2-3 hours of raw audio.
“The cost of a ready-made Brand Voice starts at 150 thousand rubles.”
Learning Artificial Intelligence
When the recording is ready, the training of the voice model begins. It processes the recorded material, learns to reproduce the voice, and as a result is able to synthesize speech itself from any arbitrary text.
Transformers, a deep neural network architecture introduced in 2017 by Google Brain researchers, are used to solve this class of problems. The most famous transformers are the GPT (Generative Pre-trained Transformer) neural networks of the nonprofit organization OpenAI. This technology, for example, allows you to most accurately fill in the gap or predict the next word in a phrase, focusing on previous words.
The same principle is used to create Brand Voice solutions. The trained model is run on a huge amount of data – several models with different parameters are run and the best one is selected at the output. It is important that the robot correctly “translates” the text into voice, and does not make mistakes in pronunciation and intonation. To improve the quality of the synthesis, the model is retrained for specific usage scenarios, resulting in the most natural-sounding voices.
Where’s the logic?
The meaningful content of the robot, its business logic, and human interaction scenarios are created in close communication with the customer. In order for a voice assistant to bring maximum benefit to a business, you need to have a good understanding of how that business is organized, what questions the customer will ask, and in what situations the assistant will be approached by the customer.
Coming up with cases from scratch is a bad idea, the logic of interaction with the client should be real. If the assistant meets the person on the line, the script is based on a consulting, selling or some other script – the sequence of the call center employee’s actions in the dialogue with the client. In preparing a script for the voice assistant, it helps to analyze requests from real users, interviews with employees who regularly communicate with them, or UX experiments aimed at finding out the real requests of people.
“If the assistant meets the person on the phone line, the script is based on a consulting, sales or some other script.”
Many customers try to have a voice assistant help customers with issues they find difficult to handle on their own. For example, it is better to give the robot the functions that are “deep” hidden or non-obvious when working in a mobile application.
Irina Stepanova, the designer-analyst of conversational interfaces Just AI: “You have to understand that the customer behaves differently in different channels – chat, app, phone. So first of all you need to carefully study the customer journey map in those channels where you plan to introduce a voice assistant. In the visual interface the client has fewer ways to make a mistake – in front of their eyes is almost everything the service has to offer. In the voice interface, the user does not have a good feel for the limitations of the service, so the user has to imagine that he can use a long phrase to voice his request, which should highlight important phrases for the program to determine the essence of the request. A separate task is to design a script for an off-topic, for which there is no ready-made script. The client can ask anything. What makes a robot human is the variability of answers, when it answers the same question in different ways.
One of the problems when developing a voice interface is discoverability: how can you tell what the assistant knows and what he can help with? Here it is necessary to be proactive – to sound out the skills and abilities and guide the user through the script, suggesting further steps, helping him in dead-end branches when he gets to the “processing of unrecognized requests”. You can also talk about the assistant’s abilities outside of the assistant: in ads, newsletters, and other marketing tools.
A voice assistant should not only be useful, but also interesting to talk to. The developers always try to put as much as possible into the “brain” of Brand Voice, to give it a character and personality.
Learning is an ongoing process
The development of the voice model doesn’t stop even after it goes live. After six months the quality of the model improves, and after a year it develops beyond recognition. If the client has allowed logging, i.e. recording information about events during the voice assistant’s work, all data about errors is collected and used for further training of the model. Logging may be required when the assistant cannot recognize specific words and phrases or makes mistakes in their pronunciation, such as names of medical products or delivery service assortment.
Brand Voice creation usually takes place in the cloud and requires the use of personal data, which often raises security concerns for customers. And while distrust of the cloud is an outdated stereotype, if it is important to the customer that the data does not leave the company perimeter, its processing can be done strictly within the IT circuit of the organization. Personal data is also used in logging, data is anonymized to ensure its confidentiality.
Creating new work scenarios and retraining models for Brand Voice is an ongoing process. In fact, when you order a turnkey voice solution, you receive a service that is continually being improved. A truly high-quality voice assistant is capable of not only noticing the staff of an entire call center, but also become a bright accent that adds personality to the company’s image.
Elon Musk’s “Noah’s Ark” will take a million people to Mars
Japanese astronomers found an unknown structure in the galaxy
A saber of unknown origin was found in Greece. Scientists baffled by a strange artifact