Automatic Speech Emotion Recognition using Mel Frequency Cepstrum Co-efficient and Machine Learning Technique

Shoaib Mustafa; Akmal Khan; Shabir Hussain; M.Zeeshan Jhandir; Rafaqat Kazmi; Imran Sarwar Bajwa

doi:10.51846/vol4iss1pp124-130

Automatic Speech Emotion Recognition using Mel Frequency Cepstrum Co-efficient and Machine Learning Technique

Shoaib Mustafa National College of Business Administration & Economics, Rahim Yar Khan, Punjab, Pakistan
Akmal Khan Department of Computer Science, The Islamia University of Bahawalpur, Pakistan
Shabir Hussain School of Information Engineering, Zhengzhou University, China
M.Zeeshan Jhandir Department of Computer Science, The Islamia University of Bahawalpur, Pakistan
Rafaqat Kazmi Department of Computer Science, The Islamia University of Bahawalpur, Pakistan
Imran Sarwar Bajwa Department of Computer Science, The Islamia University of Bahawalpur, Pakistan

DOI: https://doi.org/10.51846/vol4iss1pp124-130

Keywords: SER, MFCC, Cepstral coefficients, SVM, Toronto Database, IEMOCAP Database

Abstract

An audio speech signal is the quickest and accepted way of communication between humans. This fact provoked researchers and scientists to use the speech signal to communicate between humans and machines to make machines work more efficiently. In the study of human-robot interaction (HRI), emotion can be helpful in many applications. The most significant difference between humans and machines is an emotion; if the machine acts with emotion, more people can acknowledge the machine. Every day through conversation: emotion in speech is a significant key to meaning out the speaker’s underlying intention to identify several emotional states, which help people who have a problem in understanding and recognition emotion. Automatic speech emotion is a difficult task that depends on the capability of the Speech feature. In this work, study an algorithm involving MFCC computation and Support Vector Machine (SVM) is used to perform the task of Speech emotion recognition system of collectively five emotions named Angry, Happy, Neutral, Pleasant Surprise and Sadness. Two databases are used for this purpose (Toronto University speech dataset and IEMOCAP speech dataset) with 97% and 86% accuracy. This work can be enhanced by adding preprocessing steps before feature extraction and considering more artifact features (like pitch, time domain) and the current features. Moreover, along with these databases, other popular databases like the Berlin speech database can enhance accuracy.

Published

2021-03-17

How to Cite

[1]

S. Mustafa, A. Khan, S. Hussain, M. Jhandir, R. Kazmi, and I. S. Bajwa, “Automatic Speech Emotion Recognition using Mel Frequency Cepstrum Co-efficient and Machine Learning Technique”, PakJET, vol. 4, no. 1, pp. 124-130, Mar. 2021.

Download Citation

Issue

Vol 4 No 1 (2021): Pakistan Journal of Engineering and Technology

Section

Research Articles

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

COPYRIGHT POLICY

UOL journals follow an open-access publishing policy and full text of all articles is available free, immediately upon acceptance. Articles are published and distributed under the terms of the CC BY-SA 4.0 International License. Thus, work submitted to UOL Journals implies that it is original, unpublished work of the authors; neither published previously nor accepted/under consideration for publication elsewhere.

Authors will be responsible for any information written/informed/reported in the submitted manuscript. Although we do not require authors to submit the data collection documents and coded sheets used to do quantitative or qualitative analysis, we may request it at any time during the publication process, including after the article has been published. It is author's responsibility to obtain signed permission from the copyright holder to use and reproduce text, illustrations, tables, etc., published previously in other journals, electronic or print media.

Conflict of interest statements will be published at the end of the article. If no conflict of interest exists, the following sentence will be used: "The authors declare no conflict of interest." Authors are required to disclose any sponsorship or funding received from any institution relating to their research. The editor(s) will determine what disclosures, if any, should be available to the readers.

Authors are not permitted to post the work on any website/blog/forum/board or at any other place, by any means, from the time such work is submitted to UOL journals until the final decision on the paper has been given to them. In case a paper is accepted for publication, the authors may not post the work in its entirety on any website/blog/forum/board or at any other place, by any means, till the paper is published in UOL Journals.

The authors may, however, post the title, authors’ names and their affiliations and abstract, with the following statement on the first page of the paper - "The manuscript has been accepted for publication in UOL Journals". After publication of the article, it may be posted anywhere with full journal citation included.

All articles published in UOL journals are open-access articles, published and distributed under the terms of the Creative Commons Attribution-ShareAlike 4.0 International License which permits remixing, transformation, or building upon the material, provided the original work is appropriately cited mentioning the authors and the publisher, as well as the produced work is distributed under the same license as the original.

In the future, UOL may reproduce printed copies of articles in any form. Without prejudice to the terms of the license given below, we retain the right to reproduce author's articles in this way.

Brief Summary Of The License Agreement

By submitting your research article(s) to UOL Journal(s), you agree to Creative Commons Attribution-ShareAlike 4.0 International License which states that:

Anyone is free:

o To copy and redistribute the material in any medium or format
o To remix, transform, or build upon the material for any purpose, even commercially

Provided:

o The author and the publisher have been appropriately credited
o The link to license is provided
o Indicated if any changes were made
o The material produced is distributed under the same license as the original

Automatic Speech Emotion Recognition using Mel Frequency Cepstrum Co-efficient and Machine Learning Technique

Abstract

COPYRIGHT POLICY

Brief Summary Of The License Agreement

Most read articles by the same author(s)