Basic concepts of speech

Speech is a complex phenomenon. People rarely understand how is it produced and perceived. The naive perception is often that speech is built with words, and each word consists of phones. The reality is unfortunately very different. Speech is a dynamic process without clearly distinguished parts. It’s always useful to get a sound editor and look into the recording of the speech and listen to it. Here is for example the speech recording in an audio editor.

All modern descriptions of speech are to some degree probabilistic. That means that there are no certain boundaries between units, or between words. Speech to text translation and other applications of speech are never 100% correct. That idea is rather unusual for software developers, who usually work with deterministic systems. And it creates a lot of issues specific only to speech technology.
Structure of speech

In current practice, speech structure is understood as follows:

Speech is a continuous audio stream where rather stable states mix with dynamically changed states. In this sequence of states, one can define more or less similar classes of sounds, or phones. Words are understood to be built of phones, but this is certainly not true. The acoustic properties of a waveform corresponding to a phone can vary greatly depending on many factors – phone context, speaker, style of speech and so on. The so called coarticulation makes phones sound very different from their “canonical” representation. Next, since transitions between words are more informative than stable regions, developers often talk about diphones – parts of phones between two consecutive phones. Sometimes developers talk about subphonetic units – different substates of a phone. Often three or more regions of a different nature can easily be found.

The number three is easily explained. The first part of the phone depends on its preceding phone, the middle part is stable, and the

next part depends on the subsequent phone. That’s why there are often three states in a phone selected for HMM recognition.

Sometimes phones are considered in context. There are triphones or even quinphones. But note that unlike phones and diphones, they are matched with the same range in waveform as just phones. They just differ by name. That’s why we prefer to call this object senone. A senone’s dependence on context could be more complex than just left and right context. It can be a rather complex function defined by a decision tree, or in some other way.

Next, phones build subword units, like syllables. Sometimes, syllables are defined as “reduction-stable entities”. To illustrate, when speech becomes fast, phones often change, but syllables remain the same. Also, syllables are related to intonational contour. There are other ways to build subwords – morphologically-based in morphology-rich languages or phonetically-based. Subwords are often used in open vocabulary speech recognition.

Subwords form words. Words are important in speech recognition because they restrict combinations of phones significantly. If there are 40 phones and an average word has 7 phones, there must be 50^7 words. Luckily, even a very educated person rarely uses more then 20k words in his practice, which makes recognition way more feasible.

Words and other non-linguistic sounds, which we call fillers (breath, um, uh, cough), form utterances. They are separate chunks of audio between pauses. They don’t necessary match sentences, which are more semantic concepts.

(1 votes, average: 5.00 out of 5)

Похожие топики по английскому:

Warhammer 40 000 – “red reward” by mitchel scanlon THEY HAD COME upon the body by chance. Buried in frozen mud, it had been found by two Guardsmen as they hurried to resurrect the...
Plato Plato was a classical Greek philosopher and mathematician, more so known as the student of Socrates and writer of philosophical dialogues. He founded the Academy...
Apple’s co-founder ron wayne on its genesis, his exit and the company’s future It was a sunny but windy Tuesday morning in Brighton and the first day of Update Conference, an event that’s primary focus is mobile design...
General interview questions Questions start the minute the interview does, and to show that you are an exceptional candidate, you need to be prepared to answer not only...
Secret garden, chapter ii Mary had liked to look at her mother from a distance and she had thought her very pretty, but as she knew very little of...
How will entries be judged?(imagine cup) How Will Entries Be Judged? Round 1 Local Country/Region Competition Phase: All local country/region competitions will adhere to the same global judging criteria as outlined...
Краткое содержание Холодная осень Бунин Холодная осень Действие рассказа начинается в начале первой мировой войны и делится на две части: до ухода героя на фронт и после его смерти. Вечер...
Usmle – tests 1. A 69-year-old male with a 45 pack-year smoking history presents with hemoptysis, 20 lb. weight loss, and proximal muscle weakness that improves throughout the...
Coca cola facts Coca Cola has been invented by doctor/pharmacist John Pemerton in 1886. John believed that this drink was medicine and since it was made from coca...
A normandy invasion Decades ago, back when I worried nobody in Paris liked me (I was an American – and a food critic), the wife of a French...
The plastic house for tomorrow New developments of architects are known today. One of them is a design of a new house. Young architects from the architectural department of the...
Candace bushnell – sex and the city (chapter 6) New York’s Last Seduction: Loving Mr. Big A fortyish movie producer I’ll call Samantha Jones walked into Bowery Bar, and, as usual, we all looked...
Two executives working in the garment center are having lunch together… – Two executives working in the garment center are having lunch together. Goldstein says to his friend, “Last week was one of the worst weeks of...
Sidney sheldon, master of the game SIDNEY SHELDON MASTER OF THE GAME 1982 PROLOGUE Kate 1982 The large ballroom was crowded with familiar ghosts come to help celebrate her birthday. Kate...
Public places. (phrasebook) At a hotel What hotel are you staying at? There is a room reserved for me. Do you have a single (double) room? Give me...
Bonkistry Introductory Chemistry at Duke has been taught for about a zillion years by Professor Bonk (really), and his course is semi-affectionately known as “Bonkistry.” He...
A game of thrones part 2/2 (a song of ice and fire) EDDARD He dreamt an old dream, of three knights in white cloaks, and a tower long fallen, and Lyanna in her bed of blood. In...
Happiness is an attitude The 92-year-old, petite, well-poised and proud lady, who is fully dressed each morning by eight o’clock, with her hair fashionably coifed and makeup perfectly applied,...
Object overhead: the hidden. net memory allocation cost When developing a. NET application, one of the least visible sources of memory consumption is the overhead required by an object simply to exist. In...
Vladimir nabokov. laughter in the dark Laughter in the Dark By Vladimir Nabokov For Vera 1 ONCE upon a time there lived in Berlin, Germany, a man called Albinus. He was...