Automatic Speech Emotion Recognition using Mel Frequency Cepstrum Co-efficient and Machine Learning Technique

  • Shoaib Mustafa National College of Business Administration & Economics, Rahim Yar Khan, Punjab, Pakistan
  • Akmal Khan Department of Computer Science, The Islamia University of Bahawalpur, Pakistan
  • Shabir Hussain School of Information Engineering, Zhengzhou University, China
  • M.Zeeshan Jhandir Department of Computer Science, The Islamia University of Bahawalpur, Pakistan
  • Rafaqat Kazmi Department of Computer Science, The Islamia University of Bahawalpur, Pakistan
  • Imran Sarwar Bajwa Department of Computer Science, The Islamia University of Bahawalpur, Pakistan
Keywords: SER, MFCC, Cepstral coefficients, SVM, Toronto Database, IEMOCAP Database

Abstract

An audio speech signal is the quickest and accepted way of communication between humans. This fact provoked researchers and scientists to use the speech signal to communicate between humans and machines to make machines work more efficiently. In the study of human-robot interaction (HRI), emotion can be helpful in many applications. The most significant difference between humans and machines is an emotion; if the machine acts with emotion, more people can acknowledge the machine. Every day through conversation: emotion in speech is a significant key to meaning out the speaker’s underlying intention to identify several emotional states, which help people who have a problem in understanding and recognition emotion. Automatic speech emotion is a difficult task that depends on the capability of the Speech feature. In this work, study an algorithm involving MFCC computation and Support Vector Machine (SVM) is used to perform the task of Speech emotion recognition system of collectively five emotions named Angry, Happy, Neutral, Pleasant Surprise and Sadness. Two databases are used for this purpose (Toronto University speech dataset and IEMOCAP speech dataset) with 97% and 86% accuracy. This work can be enhanced by adding preprocessing steps before feature extraction and considering more artifact features (like pitch, time domain) and the current features. Moreover, along with these databases, other popular databases like the Berlin speech database can enhance accuracy.

Published
2021-03-17
How to Cite
[1]
S. Mustafa, A. Khan, S. Hussain, M. Jhandir, R. Kazmi, and I. S. Bajwa, “Automatic Speech Emotion Recognition using Mel Frequency Cepstrum Co-efficient and Machine Learning Technique”, PakJET, vol. 4, no. 1, pp. 124-130, Mar. 2021.

Most read articles by the same author(s)