Voice scan is one of the commonly used biometric techniques that apply the different aspects of voice to verify the identity of the individual. It is a bit different than that from speech recognition, which focuses on the translation of the words uttered by the user. Voice scan is a combination of physiological (vocal tract for the degree of sound) and behavioral (fashion and tone) aspects. The user needs to recite a received phrase or word in order to get verified.

Generic principle of voice scan

Voice biometric system comprises an acquisition device like a microphone, mobile, or landline in order to capture the phrase uttered by the user. Then the signal is converted from analog to digital and then transmitted to a PC for template generation/matching. Nowadays, voice scan is integrated with speech recognition systems to combine functionalities.

Voice data acquisition

The user is instructed to select a phrase and repeat it a specified number of times. The phrase must last 1-1.5 seconds since short words can result in a lack of data and long words can reduce the accuracy. It is important to know that the performance of voice scan relies on the data quality and the variation between the device and environment. Hence, a telephone is used due to user comfort and also options to filter out noise.

Data processing and feature extraction

The next step is to process the acquired data. the main process done here is the removal of non-spoken frequencies in order to obtain a proper template. From the processed data, features like pitch, quality, fundamental frequency, intensity, nasal articulation, spectrogram etc. are obtained. These features are replicable only by humans and hence become ideal for voice template creation which relies on capturing of multiple samples.

Template generation and matching

The matching of template is based on statistics and the most commonly used one is the hidden markov model. It is generalized profile formed by the comparison of multiple samples to identify the repetition. The resultant is stored as a template and is compared with input samples. The templates can be as large as 10000, but the storage space is less.

Verification by voice scan system

When a user attempts verification, the system compares the input with the template and returns a statistical rating of likelihood. The challenge for such system is that there are instances where the user might speak at a different pitch, speed, and volume. But with the evolution of technology, better algorithms have been developed that overcome above-mentioned challenge.


Voice scan is majorly used in postrelease programs with compliance with probation, parole, and home detention. Tracking requires manpower and resources whereas the traditional methods require officers and are time delaying. Hence to overcome this, T-NETIX and Buytel came up with an automatic system that telephones the user at regular intervals and verifies whether the person is at home using the answering voice. If the voice doesn’t match or the system doesn’t receive input, then an alert is sent to the officers. This is how a voice scan is implemented in real life.

Credits to Mary for helping with the blog!!

Click to read about face scan

Click to read about tesla optimus