Speech Analysis and Synthesis Model

Homer Dudley developed a speech synthesizer called VODER (voice operating demonstrator) which was an electrical device with mechanical controls. Scientists like Harvey Fletcher and Homer Dudley first established the importance of the signal spectrum for reliable identification of the phonetic nature of a speech sound. Early speech recognition systems used phonetic elements of speech i.e. the basic sounds of language. They were acoustically realized in spoken utterances. The vibration of vocal cord and the air that blows out while we speak creates a resonance which was similar to the resonance in the acoustic tube.

This resonance were called as formants or Formant Frequencies.

This area was consider as a major area in speech power spectrum. In 1952, Davis, Biddulph, and Balashek built digit recognition system at the Bell Laboratories. Another Scientist, Forgie built a speaker-independent 10 vowel recognizer. In 1960, many Japanese labs demonstrated capability of building special purpose hardware to perform speech recognition. An alternative to the use of speech segmenter was the concept of adopting a non-uniform time scale for aligning speech patterns.

Get quality help now
Writer Lyla
Writer Lyla
checked Verified writer

Proficient in: Commemorative Speech

star star star star 5 (876)

“ Have been using her for a while and please believe when I tell you, she never fail. Thanks Writer Lyla you are indeed awesome ”

avatar avatar avatar
+84 relevant experts are online
Hire writer

Some scientists proposed the concept of dynamic programming to understand and recognize speech in a non-uniform time scale form.

This concept was accepted by many scientists and one of them was Tom Martin at RCA laboratories and Vintsyuk in the Soviet Union. The need for dealing with temporal non-uniformity in repeated speech events and reliability of the recognizer performance was first proposed by Martin. The use of dynamic programming for time alignment between utterances in order to derive a meaningful assessment was proposed by Vintsyuk.

Get to Know The Price Estimate For Your Paper
Topic
Number of pages
Email Invalid email

By clicking “Check Writers’ Offers”, you agree to our terms of service and privacy policy. We’ll occasionally send you promo and account related email

"You must agree to out terms of services and privacy policy"
Write my paper

You won’t be charged yet!

In the late 1970’s Sakoe and Chiba proposed more formal method generally known as dynamic time warping in speech pattern matching. Since the late 1970’s mainly due to the publications from Sakor and Chiba dynamic programming became an indispensable method in automatic speech recognition. During the late 1960’s Atal and Itakur formulated the fundamental concept of Linear Predictive Coding (LPC), which simplified the estimation of the vocal tract response from speech waveform. Based on earlier success at aligning speech utterances, first speech recognition company was founded by Tom Martin called Threshold technologies. And developed the first real ASR product called the VIP-100 systems.

This system was only used by a few simple applications such as by television faceplate manufacturing firms and by FedEx for packaging and sorting on a conveyor belt. But its main importance was the way it influenced the Advanced Research Project Agency (ARPA) of the US department of Defense to fund the Speech Understanding Research (SUR) program during the early 1970’s. Harpy which was able to recognize 1011 words was developed during this program. The best thing about the Harpy system was that the speech recognition was represented as a network of lexical representation and word boundaries. Efforts by Fred Jelinek at IBM were aimed at creating a voice activated typewriter. This was a speaker-dependent system (i.e. every individual user needed to teach the typewriter).

At AT&T Bell Laboratories, the prime focus of the research program was to provide automated telecommunication services to the public such as voice dialing, and command and control routing of phone calls. This was a speaker-independent system (i.e. this system did not need any training for its use). For the development of this system statistical analysis of various accent was done and later a model was created which could easily understand the accent for a wide range of users. Speech recognition industry changed from a template-based industry to a statistical modeling framework. The rapid developments in the statistical methods most importantly the hidden Markov model caused the convergence in the system design. Today the speech recognition we see is the work done in 1980’s and 1990’s. Another technology which found its application was the artificial neural network (ANN). In its early attempts recognizing simple words was very easy. Main problem was faced due to the continuous variation of speech. the research later focused on combining the artificial neural network with hidden Markov model to take advantage of the continuous changes in speech. In 1990’s, various pattern recognition advancements took place.

These problems followed Bayes framework. Later after the continuous contribution of the academia, researchers, and government the technology matured. (Juang and Rabiner 2004) Industrial Applications Speech Recognition for work measurement Human work in industries is now changing from supervising people to supervising robots. Work measurement is one of the most time consuming and very inefficient task. Digital image processing in coordination with speech recognition can make this task very efficient (Sim et al. 2006). In this system, the current state of the system will be found out using the speech recognition technique and later with image processing, comparison of consecutive images pixels in the same environment will provide the data about the motion of a worker.

Here they implemented a speech recognition prototype system which can acquire the state of the system. The image processing consist of two phases firstly the motion representation phase and later cycle segmentation phase. In the motion representation phase repeated tasks were identified and time between them were found out. In the cycle segmentation phase the repeated motion required to complete one process were identified, identified, and measured. Speech Recognition in Warehouses Warehouse systems and warehouse Control systems were the systems designed to handle all physical aspects of warehouse or distribution center including material handling equipment control, stock location mapping, inter-location movement control, inter-zone routing, stock allocation, order management all at the operator interface. (“W&H Systems’ Warehouse Control System (WCS) Software - Supply Chain 24/7 Paper” 2014) With the advancements in speech recognition technology voice directed picking and voice directed distribution has been possible. With these technology the computer provides the instruction to the worker like where to go and where is the product stored. Workers are then required to confirm each task has been completed by saying some pre-determined phrases.

The benefit of this technology was that the worker was safer in the environment as his hands and eyes were free. This methodology is one of the fastest and most accurate method than using paper and pick list. (“Voice Picking within a Warehouse to Improve Efficiency | Manufacturing & Logistics IT Magazine” n.d.) Warehouse & Honeywell Honeywell brought a product called Vocollect, a solution for the warehouses and distribution centers to perform their operations using voice. This product was able to achieve 35% more productivity, while reducing 25% errors, and training time to 50%. (“Voice Picking within a Warehouse to Improve Efficiency | Manufacturing & Logistics IT Magazine” n.d.) Speech Recognition and Maintenance Maintenance operations are performed everywhere across the industries and even at homes. Traditionally maintenance operations were carried out with large instruction manuals, check lists etc. which required a lot of time to move through each step. Solutions with voice-directed processes have shown proven benefits over conventional paper based system.

These systems have proved saving time and far greater picking and replenishment accuracy than pen and paper together with hands and eye free operations. Companies M&I have optimized their maintenance using voice based maintenance technology. (John Bradshaw 2017). Siemens taking maintenance and speech recognition to a new level On the other hand, Siemens has developed a system for maintenance where the technician instead of calling someone at the workplace for help in complicated situations will be talking to voice activated service. This service will not only guide the technician for proper maintenance but the technician will also work with safety. (Pease 2017) Voice Guided Robots in industry Voice guiding has also found ways through which we can command an industrial robot. Voice guided robots can be used in places like clean rooms, industries, laboratories where we need close cooperation with robots.

Welding and pick-and-place robot were developed by which using voice instructions desired task could be performed. (John Bradshaw 2017) Human machine interactions are more easier after the development of speech recognition technology. Pai, Yap, and Ramesh et al. (2014) studied how to implement speech recognition in a small manufacturing cell for performing simple tasks in a virtual simulation process. In this study an augmented reality robotics work cell was created which included a robotic arm, conveyor belt, a computer numerical control (CNC) machine, and a pallet. Commands were issued using the Microsoft’s speech recognition software. They found that voice interface should not interfere with tasks requiring high precision. With higher sense of immersion by the user tough tasks can be performed. Speech Recognition and Automobile future Most of us never know how to fine tune the car as per the driver so that it can automatically mimic the driving style of the driver. But with the advent in AI and speech recognition training a computer and mimicking the style patterns of the driver are very easy.(Alan S. Brown 2016).

Voice recognition with its extensive capabilities promises us a future with of communication with our products like our cars, offices, homes etc. Product design will also get transformed. Users wont understand the product but the product will understand the product. Speech Recognition in aircrafts In the future, pilots will use speech recognition for target selection, communication fight deck task. Which are now can only be performed by hand. Engineers at Rockwell Collins Advance Technology Center are developing a voice control system for the aircrafts. Where in the cockpit the voice recognition system will be controlled just by pressing a button. They found that implementing voice recognition technology in aircrafts pilots were able to reduce the time required for complicated tasks.(Woodrow Bellamy III 2014)

Updated: Oct 10, 2024
Cite this page

Speech Analysis and Synthesis Model. (2021, Dec 14). Retrieved from https://studymoose.com/speech-analysis-and-synthesis-model-essay

Speech Analysis and Synthesis Model essay
Live chat  with support 24/7

👋 Hi! I’m your smart assistant Amy!

Don’t know where to start? Type your requirements and I’ll connect you to an academic expert within 3 minutes.

get help with your assignment