Text Dependent And Text Independent Computer Science Essay


This research work aims at planing both text-dependent and text-independent talker acknowledgment system based on mel frequence cepstral coefficients ( MFCCs ) and voice activity sensor ( VAD ) . VAD has been employed to stamp down the background noise and distinguish between silence and voice activity. MFCCs will be extracted from the detected voice sample and will be compared with the database for acknowledgment of the talker. A new standard for sensing is proposed which is expected to demo really good public presentation in noisy environment.

The system will be implemented on MATLAB platform and a new attack for planing a voice activity sensor ( VAD ) has been proposed. In order to turn out the effectivity of the proposed system comparative analysis of the proposed design attack will be done with the Artificial nervous webs technique. In recent old ages at that place has been a important sum of work, both theoretical and experimental, that has established the viability of unreal nervous webs ( ANN ‘s ) as a utile engineering for talker acknowledgment.

Get quality help now
Dr. Karlyna PhD
Dr. Karlyna PhD
checked Verified writer
star star star star 4.7 (235)

“ Amazing writer! I am really satisfied with her work. An excellent price as well. ”

avatar avatar avatar
+84 relevant experts are online
Hire writer

The public presentation of both the systems will be evaluated under different noisy environments and in different linguistic communications and emotions. The overall efficiency of the proposed talker acknowledgment system depends chiefly on the sensing standards used for acknowledging a peculiar talker. Global optimisation techniques like Genetic Algorithm ( GA ) , Particle Swarm Optimization ( PSO ) etc. can turn out really utile in this context and hence for puting up of the sensing standards Genetic Algorithm will be employed.


Development of talker acknowledgment system began in early 1960 ‘s with the geographic expedition into voiceprint analysis, where the features of an single voice were thought to be able to qualify the singularity of an single much like a fingerprint.

Get to Know The Price Estimate For Your Paper
Number of pages
Email Invalid email

By clicking “Check Writers’ Offers”, you agree to our terms of service and privacy policy. We’ll occasionally send you promo and account related email

"You must agree to out terms of services and privacy policy"
Write my paper

You won’t be charged yet!

The early systems designed had many defects and their sensing efficiency gets badly affected in the presence of noise. This fact ensured to deduce a more dependable method of foretelling the correlativity between two sets of address vocalizations. Speaker acknowledgment is the procedure of acknowledging the talker from the database based on features in the address moving ridge. Most of the talker acknowledgment systems contain two stages. In the first stage characteristic extraction is done. The alone characteristics from the voice informations are extracted which are used latter for placing the talker. The 2nd stage is feature fiting in which we compare the extracted voice informations characteristics with the database of known talkers. The overall efficiency of the system depends on how expeditiously the characteristics of the voice are extracted and the processs used to compare the existent clip voice sample characteristics with the database.

For security application to offense probes, talker acknowledgment is one of the best biometric acknowledgment engineerings. We can give our speech signal as watchword to the lock system of our place, cabinet, computing machine etc. Speaker acknowledgment can besides be helpful in verifying voice of condemnable from the audio tape of telephonic conversations. The chief advantage of biometric watchword is that there is nil like forgetting, mislaying as knowledge-based watchword.

Voice biometric compared to other biometric is user friendly, cost-efficient, convenient, secure. Robust address acknowledgment systems can be applied to high truth connected digits acknowledgment systems. It finds application in the acknowledgment of personal designation Numberss, recognition card Numberss, and telephone Numberss.

The chief demand of the modern talker acknowledgment system is that it should hold high truth, low complexness and easy computation. Hidden Markov Model ( HMM ) has been successfully applied to both the stray word and uninterrupted address acknowledgment, nevertheless it fails in turn toing favoritism and hardiness issues for categorization jobs. The acoustic analysis based on MFCC which represents the ear theoretical account [ 1 ] , has given good consequences in talker acknowledgment. Background noise and mike used besides consequence the overall public presentation of the system [ 2 ] .

Speaker acknowledgment systems contain three chief faculties:

( 1 ) Acoustic processing

( 2 ) Features extraction or spectral analysis

( 3 ) Recognition.

All three faculties are shown in Fig. 1 and are explained in item in the subsequent subdivisions.

Fig.1. Basic construction of talker acknowledgment system

Research and development on talker acknowledgment methods and techniques has been undertaken for more than four decennaries and it is still an active country. Many attacks like human aural and spectrogram comparings, simple templet matching, dynamic time-warping attacks, and modern statistical form acknowledgment attacks, such as nervous webs and Hidden Markov Models ( HMMs ) have been used. Many techniques have been used for talker acknowledgment including Hidden Markov Models ( HMM ) [ Siohan, 1998 ] , Gaussian Mixture Modeling ( GMM ) [ Reynolds, 1995 ] , multi-layer perceptrons [ Altosaar and Meister, 1995 ] , Radial Basis Functions [ Finan et al. , 1996 ] and familial algorithms [ Hannah et al. , 1993 ] .

Over the last decennary, nervous webs have attracted a great trade of attending. They offer an alternate attack to calculating and to apprehension of the human brain.A Neural webs, have the ability to deduce significance from complicated or imprecise data.They can be used to pull out forms and observe tendencies which are hard to analyze by either worlds or other computing machine techniques. The advantages offered by nervous webs are:

Adaptive acquisition, Self-Organization, Real Time Operation, Fault Tolerance via Redundant Information Coding.


Research has been focussed on Feature based Recognition Systems. Using characteristics from Speech based beginnings it has been tried to creat a dependable, robust and efficient acknowledgment system. However, fluctuations caused due to differences in single talker features, emotion fluctuations and noise perturbations increases the complexness of such a system.

Template-matching techniques are being used for Text-dependent methods.The input address is represented by a sequence of characteristic vectors, by and large short-run spectral characteristic vectors. Using a dynamic clip falsifying ( DTW ) algorithm the clip axes of the input address and each mention templet or mention theoretical account of the registered talkers are -aligned.The grade of similarity between them, accumulated from the beginning to the terminal of the address is calculated. Statistical fluctuation in spectral characteristics can be modelled by Hidden Markov Model ( HMM ) .

HMM-based methods were introduced as extensions of the DTW-based methods.A new technique for calculating confirmation tonss utilizing multiple confirmation characteristics from the list of tonss for a mark talker ‘s background talker set was introduced by Park, A ( 2001 ) .This technique was compared to the baseline logarithmic likeliness ratio confirmation mark utilizing planetary GMM talker theoretical accounts.It gave no betterment in confirmation public presentation.

Zhou, L ( 2000 ) used nervous webs and fuzzy techniques.They were applied to a talker independent address acknowledgment system. The trials for a great figure of speech templets of Chinese figures 0-9 collected from the individuals from different countries and in noisy environment gave a acknowledgment rate of 92.2 % .

Moonasar, V, Venayagamoorthy, G ( 2002 ) proposed a talker confirmation system that can be improved and made robust with the usage of a commission of nervous webs for pattern acknowledgment instead than the conventional individual web determination system. Supervised Leaning Vector Quantization ( LVQ ) nervous web as form classifier were used. Linear Predictive Coding ( LPC ) and Cepstral signal processing techniques are used to do intercrossed characteristic parametric quantity vectors to battle the consequence of reduced acknowledgment rate with increased figure of talkers to be recognized.

The most normally used acoustic vectors are Mel Frequency Cepstral Coefficients ( MFCC ) , Linear Prediction Cepstral Coefficients ( LPCC ) and Perceptual Linear Prediction Cepstral ( PLPC ) coefficients and zero crossing coefficients ( Yegnanarayana et al. , 2005 ; Vogt et al. , 2005 ) . All these characteristics are based on the spectral information derived from a short clip windowed section of address.

They differ chiefly in the item of the power spectrum representation. A new alteration of Mel-Frequency Cepstral Coefficient ( MFCC ) characteristic has been proposed for extraction of address characteristics for Speaker confirmation ( SV ) application ( Saha and Yadhunandan,2000 ) .This is compared with original MFCC based characteristic extraction method and besides on one of the recent alteration. The survey uses multi-dimensional F-ratio as public presentation step in Speaker Recognition ( SR ) applications to compare discriminatory ability of different multi parametric quantity methods.An MFCC like characteristic based on the Bark graduated table is shown to give similar public presentation in speech acknowledgment experiments as MFCC ( Aronowitz et al. , 2005 ) .The BFCC characteristics perform good for text dependent talker confirmation systems. Revised perceptual additive anticipation was proposed by Kumar et Al. ( 2010 ) , Ming et Al. ( 2007 ) for the intent of placing the spoken linguistic communication ; Revised Perceptual Linear Prediction Coefficients

( RPLP ) was obtained from combination of MFCC and PLP.

The aim of patterning technique is to bring forth talker theoretical accounts utilizing speaker-specific characteristic vectors.Such theoretical accounts will hold enhanced speaker-specific information at reduced information rate. This is achieved by working the working rules of the mold techniques. Earlier surveies on talker acknowledgment used direct templet matching between preparation and proving informations. In the direct templet matching, preparation and proving characteristic vectors are straight compared utilizing similarity step. For the similarity step, any of the techniques like spectral or Euclidian distance or Mahalanobis distance is used ( Liu et al. , 2006 ) .The disadvantage of templet matching is that it is clip devouring, as the figure of feature vectors additions. For this ground, it is common to cut down the figure of developing characteristic vectors by some mold technique like constellating. The bunch Centres are known as codification vectors and the set of codification vectors is known as codebook. The most well-known codebook coevals algorithm is the K-means algorithm ( Mporas et al. , 2007 ; Ming et al. , 2007 ) . In 1985, Soong et Al. used the LBG algorithm for bring forthing speaker-based vector quantisation ( VQ ) codebooks for talker acknowledgment. In order to pattern the statistical fluctuations, the concealed Markov theoretical account ( HMM ) for textdependent talker acknowledgment was studied. The system performances in nervous web based webs were besides studied ( Clarkson et al. , 2006 ) . In HMM, time-dependent parametric quantities are observation symbols.Observation symbols are created by VQ codebook labels. Continuous chance steps are created utilizing Gaussian mixtures theoretical accounts ( GMMs ) ( Krause and Gazit, 2006 ) . The chief premise of HMM is that the current province depends on the old state.In 1995, Reynolds proposed Gaussian mixture mold ( GMM ) classifier for talker acknowledgment undertaking ( Krause and Gazit, 2006 ; Clarkson et al. , 2006 ) .This is the most widely used probabilistic technique in talker acknowledgment. The GMM needs sufficient informations to pattern the talker and therefore good public presentation. In the

GMM mold technique, the distribution of characteristic vectors is modelled by the parametric quantities mean,

covariance and weight.GMM outperformed the other mold techniques. The disadvantage of GMM is that it requires sufficient informations to pattern the talker good ( Aronowitz et al. , 2005 ) .

Assorted research workers are still seeking to better the peformance of talker acknowledgment systems so as to accomplish better peformance.Use of assorted bing optimisation techniques viz. familial algorithm, atom drove optimisation, nervous webs etc can come ready to hand in bettering the public presentation.

Description of Broad Area/Topic

Speaker acknowledgment is the procedure of automatically acknowledging who is talking on the footing of single information included in address moving ridges. At the highest degree, all talker acknowledgment systems contain two chief faculties: characteristic extraction and characteristic matching. Feature extraction is the procedure that extracts a little sum of informations from the voice signal that can subsequently be used to stand for each talker. Feature fiting involves the existent process to place the unknown talker by comparing extracted characteristics from his/her voice input with the 1s from a set of known talkers. Each faculty will be discussed in item in ulterior subdivisions.

1. ACOUSTIC Processing

Acoustic processing is sequence of procedures that receives analog signal from a talker and change over it into digital signal for digital processing. Human speech frequence normally lies in between 300Hz-8000kHz [ 2 ] .Therefore 16kHz trying size can be chosen for entering which is twice the frequence of the original signal and follows the Nyquist regulation of trying [ 3 ] .The start and end sensing of stray signal is a consecutive forward procedure which detect disconnected alterations in the signal through a given threshold energy. The consequence of acoustic processing would be distinct clip voice signal which contains meaningful information. The signal is so fed into spectral analyzer for characteristic extraction.

2. FEATURE Extraction

Feature Extraction faculty provides the acoustic characteristic vectors used to qualify the spectral belongingss of the clip changing speech signal such that its end product eases the work of acknowledgment phase. Main stairss involved in characteristic extraction are explained below:

It is a procedure of pull outing a little sum of talker specific information in the signifier of characteristic vectors at reduced informations rate from the input voice signal that can be used as a mention theoretical account stand foring each talker ‘s individuality. A general block diagram of talker acknowledgment system is shown in Fig 2.

Fig.2 Speaker acknowledgment system

It is clear from the above diagram that the talker acknowledgment is a 1: N lucifer where one unknown talker ‘s extracted characteristics are matched to all the templets in the mention theoretical account for happening the closest lucifer. The talker characteristic with maximal similarity is selected.

MFCC Extraction

Mel frequence cepstral coefficients ( MFCC ) is likely the best known and most widely used for both address and talker acknowledgment. A mel is a unit of step based on human ear ‘s sensed frequence. The mel graduated table is about additive frequence spacing below 1000Hz and a logarithmic spacing above 1000Hz. The estimate of mel from frequence can be expressed as-

mel ( degree Fahrenheit ) = 2595*log ( 1+f/700 ) — — — — ( 1 )

where degree Fahrenheit denotes the existent frequence and mel ( degree Fahrenheit ) denotes the sensed frequence. The block diagram demoing the calculation of MFCC is shown in Fig. 3.

Fig.3 MFCC Extraction

In the first phase speech signal is divided into frames with the length of 20 to 40 MS and an convergence of 50 % to 75 % . In the 2nd phase windowing of each frame with some window map is done to minimise the discontinuities of the signalby tapering the begining and terminal of each frame to zero. In clip sphere window is point wise generation of the framed signal and the window map. A good window map has a narrow chief lobe and low side lobe degrees in their transportation map. In our work overacting window is used to execute windowing map. In 3rd phase DFT block converts each frame from clip sphere to frequence sphere. In the following phase mel frequence warping is done to reassign the existent frequence graduated table to human sensed frequence graduated table called the mel-frequency graduated table. The new graduated table infinites linearly below 1000Hz and logarithmically above 1000Hz. The mel frequence warping is usually realized by triangular filter Bankss with the halfway frequence of the filter usually equally spaced on the frequence axis. The warped axis is implemented harmonizing to equation 1 so as to mime the human ears perceptual experience. The o/p of the ith filter is given by-

— — — — — — — – ( 2 )

S ( J ) is the N-point magnitude spectrum ( j =1: N ) and I©i ( J ) is the sampled magnitude response of an M-channel filter bank ( one =1: M ) . In the 5th phase Log of the filter bank end product is computed and eventually DCT ( Discrete Cosine Transform ) is computed. The MFCC may be calculated utilizing the equation-

— — — — – ( 3 )

where N ‘ is the figure of points used to calculate standard DFT.

Fig.4 Triangular filter bank

Voice Activity Detector

Voice Activity Detector ( VAD ) has been used to chiefly separate speech signal from silence. VAD compares the extracted characteristics from the input speech signal with some predefined threshold. Voice activity exist if the mensural characteristic values exceed the threshold bound, otherwise silence is assumed to be present. Block diagram of the basic voice activity sensor used in this work is shown in Fig. 5

Fig. 5 VAD block diagram

The public presentation of the VAD depends to a great extent on the preset values of the threshold for sensing of voice activity. The VAD proposed here works good when the energy of the address signal is higher than the background noise and the background noise is comparatively stationary. The amplitude of the address signal samples are compared with the threshold value which is being decided by analysing the public presentation of the system under different noisy environments.

3.Feature matching

a ) Using Euclidian Distance

In the acknowledgment stage a sequence of characteristic vectors { x1, x2, aˆ¦. , xT } for unknown talkers are extracted and so compared with the codebooks in the database. For each codebook a deformation step is computed. The talker with the lowest deformation is chosen.

Therefore, each characteristic vector of the input is compared with all the codebooks, and the codebook with the minimized mean distance is chosen to be the best. The expression used to cipher the Euclidian distance can be defined as follows:

The Euclidian distance between two points P = ( p1, p2aˆ¦pn ) and Q = ( q1, q2… qn ) ,

— — — — — – ( 4 )

The talker with the lowest deformation distance is chosen to be identified as the unknown individual.

B ) Neural Networks ( NN )

Several popular categorization techniques ( pattern fiting ) : HMM, GMM, DTW, VQ, NN are being used for Speaker Recognition. NN gives much less error rates on little samples and hence NN was a good pick for our work.

Nervous neworks consists of beds. Layers are made up of a figure of interrelated ‘nodes ‘ which contain an ‘activation map ‘ . Forms are presented to the web via the ‘input bed ‘ , which communicates to one or more ‘hidden beds ‘ where the existent processing is done via a system of leaden ‘connections ‘ . The concealed beds so link to an ‘output bed ‘ .

Most ANNs contain some signifier of ‘learning regulation ‘ which modifies the weights of the connexions harmonizing to the input forms.


Automatic talker acknowledgment works on the rule that a individual ‘s address exhibits features that are alone to the talker. Speech signals in preparation and proving Sessionss can non be same due to many facts such as people ‘s voice alteration with clip, wellness conditions, talking rates, etc. Acoustic noise and fluctuations in entering environments show a challenge to speech acknowledgment.The challenge would be to do the system “ Robust ” . If the acknowledgment truth does non degrade significantly, the system is called “ Robust ” .


The ends of this research work are:

Develop a new text-dependent and text-independent talker acknowledgment model with the aid of MFCC and VAD.

Dynamically train the talker acknowledgment system with clean and noisy ( linear and convolutive ) address signals. Each clip a new speech signal is input to the system, linear white Gaussian noise at different values of SNR and reverberation with changing values of hold are added to the clean address signals.

Investigate the public presentation of the proposed text-independent and text-dependent talker acknowledgment systems under noisy environments.

Calculate the truth rates of placing the trial talker in clean and noisy environments utilizing the designed talker acknowledgment theoretical account and compare it with the unreal nervous web based talker acknowledgment technique.

5. To analyse the best method of taking background noise in voice signal.


Speaker acknowledgment is the procedure of automatically acknowledging who is talking based on alone features contained in the address moving ridge. Most of the talker acknowledgment systems contain two stages. In the first stage characteristic extraction is performed in which the unique features from the voice informations are extracted which can be used latter for placing the talker. In the 2nd stage characteristic matching is performed and this stage comprises of the existent processs carried out for placing the talker by comparing the extracted voice informations characteristics with the database of known talkers. The overall efficiency of the system depends on the fact that how expeditiously the characteristics of the voice are extracted and the processs used for comparing the existent clip voice sample characteristics with the database.

The undermentioned stairss will be performed:

a ) voice will be recorded utilizing mike

B ) Voice activity sensing to be performed on the extracted voice

degree Celsius ) Feature extraction utilizing MFCC

vitamin D ) Speaker acknowledgment utilizing Euclidean distance

vitamin E ) Compare the consequence obtained in ( vitamin D ) utilizing Neural Network

degree Fahrenheit ) Calculate % mistake for ( vitamin D ) and ( vitamin E )

g ) Display on consecutive port

Datas: This work focuses on developing a system that uses the speech signal as a acknowledgment system. The speech signal will be recorded utilizing mike. The signal is text dependant, where talkers will express the words which will organize a database. Different talker will bring forth different address moving ridges.

Tools: The chief tools that will be used in this research is MATLAB package. The MATLAB DSP ( Digital Signal processing ) tool chest and nervous web tool chests will be used to develop the plans in the package. A GUI will be designed in MATLAB for talker acknowledgment.

Hardware: The hardware that will be used in this research is:

1. Laptop

2. Intel Pentium Core 2 Duo 1.6Ghz

3. USB Personal computer Microphone

Fig.6 shows the flow chart of Automatic Speaker Recognition System

Current voice from talker is non present in database


Speaker is valid & A ; talker Idaho is end product

No address is present & A ; maintain on look intoing voice i/p

Comparision block ( compares present voice with the database )

Database voice sample MFCCs

Is Match Found?

Is Voice Activity Detected?

VAD block to observe voice activity Incoming address samples





Extract MFCCs of the detected voice activity

FIG 6: Flow Chart of Speaker Recognition System


The complete system will dwell of package coded in matlab with graphical user interface, a mic for capturing voice based informations and a hardware circuit connected to the computing machine via consecutive port used for runing a lock and presenting the consequence on LCD.

Equally shortly as the system is activated, the mike connected to a computing machine will get down capturing voice based signals and change overing them to electrical signals that can be saved and analyzed.

Coded in matlab the system will analyse the informations captured by mike for white noise and for background informations that will be differentiated by voice if it is below a specified threshold bound.

This information will be utilized to filtrate out the needful address bid from the complete voice signal holding noise and background sound. The undertaking will be accomplished by bring forthing voice signals similar to resound and play down sound but will be 180 grades out of stage with them, so as that can be cancelled ensuing in merely the needed address bid.

Once the voice bid is successfully extracted from the complete signal, this will be so analyzed, pull outing assorted parametric quantities needed for successful comparing to the database address.

The extracted characteristics will be:

Base frequences present in the signal

The amplitude fluctuation of the extremums

The energy envelope nowadays in the signal

The above mentioned parametric quantities will be compared with the parametric quantities of the address stored in database in the signifier of moving ridge file. A threshold will be defined for each characteristic, if the comparings made for each characteristic is under specified thresholds, so the consequence will be declared true otherwise false. In either instance a information package associated with the consequence will be sent over consecutive port ( UART protocol ) , to the microcontroller.

The hardware portion will dwell of a microcontroller, Relay and 16×2 LCD. On having the message from the computing machine via consecutive port ( UART protocol ) this microcontroller will run a relay and will blink a message on the LCD describing the consequence either matched or odd. The relay end product further can be used to run a actuator to open or shut a door.

Cite this page

Text Dependent And Text Independent Computer Science Essay. (2020, Jun 02). Retrieved from http://studymoose.com/text-dependent-and-text-independent-computer-science-new-essay

Text Dependent And Text Independent Computer Science Essay

👋 Hi! I’m your smart assistant Amy!

Don’t know where to start? Type your requirements and I’ll connect you to an academic expert within 3 minutes.

get help with your assignment