Forensic Sciences


Spectrographic and Statistical Analysis of Speech Recorded Through Different Recording Devices

Article Number: TEY326597 Volume 06 | Issue 01 | April - 2023 ISSN: 2581-4273
24th Jan, 2023
07th Feb, 2023
24th Feb, 2023
28th Apr, 2023

Authors

Moulya B P, Geetam Shukla

Abstract

Speaker Identification is the identification of the speaker speaking the current utterance and Speaker Verification is the verification from the utterance of whether the speaker is who he claims to be. Since people are more likely to deny one’s voice in many situations during ongoing crimes. This technique aids in the resolution of such cases and in the identification of the guilty. This research paper involves recognizing and verifying the speakers based on the intonation pattern of the words they speak. It is a sample-based investigation. Here, ten people's specimens are taken for analysis, with five from northern part of India and five from southern part of India. This report emphasizes the variations in intonation patterns resulting from different recording devices. Audio recording devices include a handset, a recorder, and a laptop. The difference is noted, and the extent to which the differences can be considered is also inspected. Keywords: Speaker Recognition, Intonation Patterns, Intensity, Pitch, Formants, Standard deviation.

Introduction

The term Speaker Recognition consists of Speaker Identification which is the identification of the speaker speaking the current utterance and Speaker Verification that refers to the verification from the utterance of whether the speaker is who he claims to be (Almaadeed et al., 2015). Unlike to other biometric characteristics like fingerprints and faces, a human voice is a biometric characteristic that is not yet frequently employed for person identification. A system uses a recording of a speaker's speech to verify or ascertain the speaker's identification in automatic voice recognition, also referred to as speaker recognition (Mokgonyane et al., 2021).

There are two types of speaker recognition: text-dependent and text-independent. The process is known as text-dependent speaker verification when the lexicon of the spoken utterances is limited to a single word or phrase across all speakers, as opposed to text-independent speaker verification, where speakers are free to say whatever they want without having their utterance constrained.

In this paper the approach is based on text-dependent verification for the comparison of intonation pattern of speech signals. The basic idea is taken from the fact that the vocal tract of a speaker is what distinguishes the voice from each other. Each person's vocal tract is unique in terms of size and shape which creates differences in their pitch frequency, intensity and formant frequency (Chaubey et al., 2022).

A man produces sound waves whenever he speaks, when the vocal folds come together and vibrate due to passing of air through them during exhalation from the lungs, they produce sound. This vibration produces the sound wave. Sound waves are characterised on the basis of frequency, amplitude and wavelength. The frequency, also known as pitch, is the number of times per second that a sound pressure wave repeats itself. The (quasi-)periodic structure of voiced speech signals are approximated by the fundamental frequency of a speech signal, which is sometimes indicated by F0. The vocal folds, when appropriately tensed, create an oscillation in the airflow. The mean size of oscillations per second, measured in Hertz, is the fundamental frequency. Formant frequencies, also known as F1, F2, and F3, are created when this fundamental frequency is amplified or attenuated by different parts of the resonating body (Ali et al., 2006; Magdin et al., 2019).

Intensity is the energy of a sound wave degree of loudness associated with it. It is measured in decibels. The intensity helps us in knowing how a person talks certain words. The amplitude of the sound wave is nothing but its height or the maximum distance that a medium's vibrating particles are moved from their average position during sound production. Due to variations in amplitude and pitch in a speaker's voice over different recordings, no two digital signals are identical, even when the same words are spoken by the same speaker (Almaadeed et al., 2015).

The intonation patterns are evaluated in the software called Praat. Praat converts speech to digital signals. It consists of waveform view and a spectrogram view of a sound. The waveform view gives amplitude information over time whereas the spectrogram shows the frequency information over time, basically the amplitude is shown through the shadings. It enables us to determine a sound's pitch, intensity, and formant values. In this paper the examination of the intonation pattern of the words is done using these parameters. The findings and observations are listed in the table below, along with the conclusion (Bhrati and Bansal, 2025; Wirdayanthi, 2022).

References

Ali, Ahmed, et al. Formants Based Analysis for Speech Recognition. 2006, https://doi.org/10.1109/iceis.2006.1703179.

Almaadeed, Noor, et al. “Text-Independent Speaker Identification Using Vowel Formants.” Journal of Signal Processing Systems, vol. 82, no. 3, 2015, pp. 345–356., https://doi.org/10.1007/s11265-015-1005-5.

Bharti, Roma, and Priyanka Bansal. “Real Time Speaker Recognition System Using MFCC and Vector Quantization Technique.” International Journal of Computer Applications, vol. 117, no. 1, 2015, pp. 25–31., https://doi.org/10.5120/20520-2361.

Chaubey, Ashutosh, et al. “Improved Relation Networks for End-to-End Speaker Verification and Identification.” Interspeech 2022, 2022, https://doi.org/10.21437/interspeech.2022-10064.

Magdin, Martin, et al. “Voice Analysis Using PRAAT Software and Classification of User Emotional State.” International Journal of Interactive Multimedia and Artificial Intelligence, vol. 5, no. 6, 2019, p. 33., https://doi.org/10.9781/ijimai.2019.03.004.

Mokgonyane, Tumisho Billson, et al. “A Cross-Platform Interface for Automatic Speaker Identification and Verification.” 2021 International Conference on Artificial Intelligence, Big Data, Computing and Data Communication Systems (IcABCD), 2021, https://doi.org/10.1109/icabcd51485.2021.9519322.

Wirdayanthi, A.A. Istri. “UTILIZATION OF PRAAT IN DETERMINING THE AUTHENTICITY OF VOICE.” IJFL (International Journal of Forensic Linguistic), vol. 3, no. 1, Apr. 2022, pp. 81–89., https://doi.org/https://www.ejournal.warmadewa.ac.id/index.php/ijfl/index.

How to cite this article?

APA Style
Chicago Style
MLA Style
DOI
URL

Create Your Password

We've sent a link to create password on your registered email, Click the link in email to start using Xournal.

Sign In

Forgot Password?
Don't have an account? Create Account

Create Account

Already have an account? Sign In

Forgot Password

Do you want to try again? Sign In

Publication Tracking