Roman Urdu sentiment analysis using Machine Learning with best parameters and comparative study of Machine Learning algorithms

Sameen Aziz; Saleem Ullah; Bushra Mughal; Faheem Mushtaq; Sabih Zahra

doi:10.51846/vol3iss2pp172-177

Roman Urdu sentiment analysis using Machine Learning with best parameters and comparative study of Machine Learning algorithms

Sameen Aziz Khwaja Freed University of Engineering and Information Technology Rahim Yar Khan, Pakistan
Saleem Ullah Khwaja Freed University of Engineering and Information Technology Rahim Yar Khan, Pakistan
Bushra Mughal Khwaja Freed University of Engineering and Information Technology Rahim Yar Khan, Pakistan
Faheem Mushtaq Khwaja Freed University of Engineering and Information Technology Rahim Yar Khan, Pakistan
Sabih Zahra Khwaja Freed University of Engineering and Information Technology Rahim Yar Khan, Pakistan

DOI: https://doi.org/10.51846/vol3iss2pp172-177

Keywords: Machine Learning, TFIDF, Kaggle, SVM, RF, Logistic Regression, Naïve Bayes, AdaBoost, RANSAC, Hyper parameter

Abstract

People talks on the social media as they feel good and easy way to express their feelings about topic, post or product on the ecommerce websites. In the Asia mostly the people use the Roman Urdu language script for expressing their opinion about the topic. The Sentiment analysis of the Roman Urdu (Bilal et al. 2016)language processes is a big challenging task for the researchers because of lack of resources and its non-structured and non-standard syntax / script. We have collected the Dataset from Kaggle containing 21000 values with manually annotated and prepare the data for machine learning and then we apply different machine learning algorithms(SVM , Logistic regression , Random Forest, Naïve Bayes ,AdaBoost, KNN )(Bowers et al. 2018) with different parameters and kernels and with TFIDF(Unigram , Bigram , Uni-Bigram)(Pereira et al. 2018) from the algorithms we find the best fit algorithm , then from the best algorithm we choose 4 algorithms and combined them to deploy on the data set but after the deployment of the hyperparameters we get the best model build by the Support Vector Machine with linear kernel which are 80% accuracy and F1 score 0.79 precision 0.79 and recall is 0.78 with (Ezpeleta et al. 2018)Grid Search CV and CV is 5 fold. Then we perform experiments on the Robust linear Regression model estimation using (Huang, Gao, and Zhou 2018)(Chum and Matas 2008)RANSAC(random sample Consensus) that gives us the best estimators with 82.19%.

Author Biographies

Saleem Ullah, Khwaja Freed University of Engineering and Information Technology Rahim Yar Khan, Pakistan

Head of the Computer Science Department Khwaja Freed University of Engineering and Information Technology Rahim Yar Khan, Pakistan

Bushra Mughal, Khwaja Freed University of Engineering and Information Technology Rahim Yar Khan, Pakistan

Lecturer at Khwaja Freed University of Engineering and Information Technology Rahim Yar Khan, Pakistan

Faheem Mushtaq, Khwaja Freed University of Engineering and Information Technology Rahim Yar Khan, Pakistan

Head of the Information Technology Department Khwaja Freed University of Engineering and Information Technology Rahim Yar Khan, Pakistan

Sabih Zahra, Khwaja Freed University of Engineering and Information Technology Rahim Yar Khan, Pakistan

Ph.D scholer in Khwaja Freed University of Engineering and Information Technology Rahim Yar Khan, Pakistan I.T Department

Published

2020-10-22

How to Cite

[1]

S. Aziz, S. Ullah, B. Mughal, F. Mushtaq, and S. Zahra, “Roman Urdu sentiment analysis using Machine Learning with best parameters and comparative study of Machine Learning algorithms”, PakJET, vol. 3, no. 2, pp. 172-177, Oct. 2020.

Download Citation

Issue

Vol 3 No 2 (2020): Pakistan Journal of Engineering and Technology (Supplementary Issue)

Section

Research Articles

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

COPYRIGHT POLICY

UOL journals follow an open-access publishing policy and full text of all articles is available free, immediately upon acceptance. Articles are published and distributed under the terms of the CC BY-SA 4.0 International License. Thus, work submitted to UOL Journals implies that it is original, unpublished work of the authors; neither published previously nor accepted/under consideration for publication elsewhere.

Authors will be responsible for any information written/informed/reported in the submitted manuscript. Although we do not require authors to submit the data collection documents and coded sheets used to do quantitative or qualitative analysis, we may request it at any time during the publication process, including after the article has been published. It is author's responsibility to obtain signed permission from the copyright holder to use and reproduce text, illustrations, tables, etc., published previously in other journals, electronic or print media.

Conflict of interest statements will be published at the end of the article. If no conflict of interest exists, the following sentence will be used: "The authors declare no conflict of interest." Authors are required to disclose any sponsorship or funding received from any institution relating to their research. The editor(s) will determine what disclosures, if any, should be available to the readers.

Authors are not permitted to post the work on any website/blog/forum/board or at any other place, by any means, from the time such work is submitted to UOL journals until the final decision on the paper has been given to them. In case a paper is accepted for publication, the authors may not post the work in its entirety on any website/blog/forum/board or at any other place, by any means, till the paper is published in UOL Journals.

The authors may, however, post the title, authors’ names and their affiliations and abstract, with the following statement on the first page of the paper - "The manuscript has been accepted for publication in UOL Journals". After publication of the article, it may be posted anywhere with full journal citation included.

All articles published in UOL journals are open-access articles, published and distributed under the terms of the Creative Commons Attribution-ShareAlike 4.0 International License which permits remixing, transformation, or building upon the material, provided the original work is appropriately cited mentioning the authors and the publisher, as well as the produced work is distributed under the same license as the original.

In the future, UOL may reproduce printed copies of articles in any form. Without prejudice to the terms of the license given below, we retain the right to reproduce author's articles in this way.

Brief Summary Of The License Agreement

By submitting your research article(s) to UOL Journal(s), you agree to Creative Commons Attribution-ShareAlike 4.0 International License which states that:

Anyone is free:

o To copy and redistribute the material in any medium or format
o To remix, transform, or build upon the material for any purpose, even commercially

Provided:

o The author and the publisher have been appropriately credited
o The link to license is provided
o Indicated if any changes were made
o The material produced is distributed under the same license as the original

Roman Urdu sentiment analysis using Machine Learning with best parameters and comparative study of Machine Learning algorithms

Abstract

Author Biographies

COPYRIGHT POLICY

Brief Summary Of The License Agreement

Most read articles by the same author(s)