Following the recent release of Voice Search in Vietnamese on Android
devices, Google has announced the arrival of Vietnamese Voice Search on
desktop computers via the Chrome browser.
When visiting google.com.vn on Chrome, users can click on the microphone icon to ask their question.
Amy Kunrojpanya, Head of Public Affairs Communications for the
Greater Mekong Sub-Region, Google Asia Pacific, said her company hopes
this will prove useful to Vietnamese consumers and help them unlock more
of the web more easily and intuitively.
According to
Kunrojpanya, each time Google brings Voice Search to a new language, it
teaches computers to understand the sounds and words that make up spoken
language.
Google accomplished this by working with native speakers to collect speech samples to model the language, she said.
The Vietnamese-specific language model was built from the ground up.
“For Vietnamese, we worked with about 700 volunteers from universities
in Hanoi and HCM City to collect about 480 hours of speech samples.
Once we collected the samples, we were able to build acoustical and
language models that taught computers how to ‘recognise’ Vietnamese,”
she said.
Google spent two years working with the local volunteers.
“We were able to gather huge amounts of data from Google fans in
Vietnam, who were eager to help. Many people opened their doors to us to
help the cause of making Vietnamese awesome,” said Kunrojpanya.
She said the Vietnamese language had presented unique challenges. The
major challenge was recognising tones and transcribing the diacritics
correctly (for example, ca means to chant; ca, tomato; ca, fish).
Diversity of accents across Vietnam also required that Google widen
the sampling and double the amount of acoustic samples that are normally
collected for other languages.
Google tried to capture both
northern and southern accents, spending months “on lexicon development
on a complicated language”, she said.
“Voice Search can
recognise regional accents in Vietnamese, but it isn’t 100 percent
perfect. The good thing is that the language model improves as more
people use it,” she said.
Also, since in Vietnamese writing there is a space after each syllable, it is harder to know when a word begins and ends.
In contrast, in a language like English, whole words are separated by spaces, she said.
So Google introduced special handling of Vietnamese syllables so that
they could be properly interpreted in the context of other syllables
around them.
There were other challenges as well. For example,
many Vietnamese Google users frequently leave out accents and tone
markers when they search (for example, pho instead of pho).
“So
we had to create a special algorithm to ensure accents and tones were
restored in the search results provided, and then our Vietnamese users
would see properly formatted text in the majority of cases,” said
Kunrojpanya.-VNA