Covid-19 Detection using Deep Convolutional Neural Networks from X-Ray Images

—COVID-19 which first appeared in Wuhan city of China in December 2019, spread quickly around the world and became a pandemic. It has caused an overwhelming effect on daily lives, public health, and global economy. Many people have been effected and have died. It is critical to control and prevent the spread of COVID-19 disease by applying quick alternative diagnostic technique. On-time diagnosis of COVID-19 is a long and difficult process. On the other hand, COVID-19 diagnostic test kits are costly and not available for every individual in poor countries. For this purpose, screening patients with the established techniques like Chest X-ray images seems to be an effective method. This study employed deep learning technique on a publicly available data set. The proposed model is tested using state of the art evaluation measure and obtained better results. The highest accuracy achieved in the validation was 96.67% and an average F measure score of 98%, and an Area under Curve (AUC) of 99%.


INTRODUCTION
Finding out early infection of COVID-19 is vital for one after another so that it could not spread throughout. Polymerase chain reaction (PCR) testing is the primary technique used for COVID-19 diagnosis, and usually projecting up to 30% false-negative results and generally takes days to get results [1][2][3]. These issues have adverse effects on public wellbeing as COVID-19 has spread quickly across the globe and has taken many lives in each country, as shown in Table I [4].  [4] Medical imaging, like X-rays and Computer Tomography (CT Scans), are techniques of evaluating and analyzing the effects of COVID-19 [5]. For the analysis of COVID-19, we applied our model on the two datasets of images containing X-rays of healthy and COVID-19 patients. The breakdown of this dataset is performed with the help of the Convolution Neural Network (CNN). This study's focus was on using CNN models for identifying the COVID-19. Existing models can be utilized to classifying Covid-19 patients by making them more effective, to improve their value in real-time circumstances [6]. This paper, in section 2 also discusses the impact of deep learning on the classification of COVID-19. The results, although on moderately small datasets, shows good accuracy in detecting COVID-19. We used a small dataset of COVID-19 patients and non-COVID-19 patient's X-rays to see if it can help for early detection of the COVID-19. Normal pneumonia can sometimes be difficult to differentiate from COVID-19, so we applied the CNN model to see if it can be able to differentiate the two from Chest X-rays. An issue with our data is that the dataset set is small, and it needs an expert radiologist's opinion to verify the results. There are as of now three different ways the CNN techniques used for classification of the medical images.
 Use of pre-built CNN model  Train a CNN model from scratch  Utilize unsupervised training of images and fine-tuning The contributions of our paper are as follows: 1) Trained model by collecting publicly available datasets of X-ray images. 2) Detection of Covid-19 by analyzing X-ray images. 3) The highest accuracy achieved in the validation was 96.67% , average F measure score of 98%, and an Area under Curve (AUC) of 99%. The results obtained by our model are far superior than the studies in the literature.

II. LITERATURE REVIEW
With the quick rise in cases of COVID-19 and using deep learning techniques becoming more efficient, it is being applied for the analysis of chest X-rays for in-time detection to contain its spread. Many papers using X-rays for linking COVID-19 and deep learning have multiple or binary classifications [7]. Most of the studies use the X-ray data in raw shape, while others use techniques like extraction of features from the X-ray images [8]. There is no fixed size of data, as it is increasing day by day and varies in its origins. Among most studies, the majority preferred method for COVID detection using X-ray images is CNN Advanced CNN models can be accurately trained to learn specific and precise aspects of COVID-19 [9]. Earlier studies observe have COVID-19 utilizing chest X-beams have twofold or frequent groupings. A few investigations consume basic information, whereas others have highlighted the extraction process. The dataset used for each study also varies. Among the check, the majority favored approach is CNN [10]. With the help of transfer learning, the recognition of different abnormalities in little clinical picture datasets looks like a feasible objective, with outstanding outcomes [11]. Utilizing X-Ray images for the finding of COVID-19 and dealt with ResNet, VGG16, InceptionV3, DenseNet, and VGG16 models [12]. X-Ray images emphasize extraction and division in their investigation at that spot, COVID-19 was decidedly and frequently characterized utilizing CNN.
In a study of March 2020, ResNet50 was utilized with, InceptionV3 and Inception-ResNetV2 models toanalyzeCOVID-19 using chest X-ray pictures [13]. In Paul et al. applied 10-folded cross-validation on Resnet50 with a dataset of one hundred and two (102) COVID-19 cases and one hundred and two (102) other (pneumonia) cases. They achieved 89.2% with 80.39% overall accuracy. The percentage for the true negative was TNR= 0.99 and only 1 false positive with an Overall AUC of 0.95. Sarkar et al. used X-rays to detected pneumonia with divisible convolutions CNN [14]. Jaiswal et al. used X-rays for detecting pneumonia with Mask-RCNN [15]. Wang et al. [1] classified COVID cases from chest X-rays with the help of the Deep CNN model. Their dataset consisted of over 14,000 chest X-rays. The accuracy of classification obtained was 98. 9%. Afshar et al. used three capsule network layers with four Convolutional layers called COVID-CAPS [16]. By these studies, the importance of chest X-rays for detecting COVID-19 was established and the use of deep CNN models mandates a further exploration in this field. The relationship between tasks is not standardized the algorithm cannot decide which task is of the same relation, making it more challenging for finding the solutions. Overfitting is a hindrance of great significance for almost all prediction technologies [17].
In the context of the CNN, over-fitting can happen to the new model when it gets interference of noise from the training data influences the output negatively. Some of the major disadvantages of CNN are that they can lose the pose and orientation of the internal data, routing it to a single neuron that may not be able to process this information [18].
A CNN model finds certain features in an image to make its predictions. If the features are present, it classifies them in accordance. In a CNN, higher-level neurons get their details from the low-level neurons in a hierarchy [19]. Then, the neurons perform convolution to see if the specific features are present or not. Then all the different neurons of the model get replicating knowledge. CNN is not affected by the scale and rotation, hence it's not good for images of small dimensions [20]. Most preceding studies undergo data being disproportional or very small in size. To keep away from these limitations, using two open-source datasets seemed more appropriate. In the last year alone, a dozen studies have been performed in this field, but still there is room for more improvement. In this study, we have proposed a deep CNN model on X-ray images of the chest for COVID-19 detection.

III. MATERIALS AND METHODS
This part discusses the planned COVID-19 detection using a deep CNN model. At the start, the features of images are taken out, the convolution layer is then applied to it by using a variety of masks. It delivers features of low fidelity, and then the CNN model extracts applicable features through the convolution layer and is followed by fully connected layers. There are more than a few deep learning algorithms like CNN to RNN [21]. The CNN can be applied to these problems when the data are retrieved in an area such as applications of image processing. In this study, to detect COVID-19 cases, a deep learning CNN architecture is being utilized.

A) PROPOSED ARCHITECTURE
For discovering COVID-19 cases we made a very simplistic CNN model (Fig. 1) consisting of 4 Convolutional layers, each one of these layers uses a small filter of size 3x3 for the extraction of features. Max pooling layers of size 2x2 were used in our model, and lastly a classifier layer having a sigmoid output. For Pre-process images we contain utilized data generator Keras. The input layer is used to read a preprocessed image dataset. Xray images are applied with pre-processing. Then the images that are pre-processing are cropped and resized. This is done, when medical devices make images, more than a few letters and medical symbols are made on them, and secondly, since the images are from various sources their dimensions and sizes will be different. Therefore, the size of the image is set to 224-by-224-by-3 which is the width By height-by-channel number for this study. The lung and chest their dimensions and sizes will be different. Therefore, the size of the image is set to 224-by-224-by-3 which is the width by height-by-channel number for this study. The lung and chest areas are cropped as much as they can be to remove any writing or symbols on them. To analyze the proposed method, we used images from 2 distinct origins to make an image dataset of 284 X-rays. The images are distributed as per the sources where they were collected and are depicted in Table 2. The reason behind using these datasets was that the images were diverse since the images are collected from different countries and distinct sources which play an important role in helping radiologists design a tool for the diagnosis of COVID-19 around the globe. Secondly, these sources were open to the public and research community. Moreover, the images dataset used in this study were obtained from a GitHub repository [22] for COVID, and a Kaggle repository [23] for Non-COVID X-rays.
It is obvious from Tab. II that though the datasets are growing day by day, images available openly are not adequate to do more research in this regard, and there is a need to gather more radiology images that can be easy to get to by the research community.  FIGURE 3. Chest x-ray images dataset sample The data obtained was cleaned and preprocessed as needed. Implementing a deep learning method for reliable results requires a huge amount of dataset. But sufficient data is not available for every problem, particularly in medical-related research. Datasets related to medical studies are sometimes expensive and time-consuming or with certain terms. These problems of overfitting can be overcome by techniques such as augmentation. It can improve the accuracy of the proposed model. Augmentations included in this study were rescaled, zoom, and shearing of images. After this, the proposed model was trained on the prepared dataset. Total 284

X-ray images
For the performance measure, a confusion matrix was utilized, which is a table that is utilized to clarify the presentation of a classification model on a test dataset. It validates the implementation of an algorithm. The accuracy (A) as referred to in equation (4) of the method symbolizes the exactness of the anticipated values. Precision (P) as indicated in equation (1) signifies the reproducibility of estimation, or reason of the accurate predictions. Recall (R) shown in equation (2) demonstrates reasons fornumerous right results produced. The f1-score as calculated in equation (3) is only a combination of recall and precision.
Where,The True Negative (TN) is the correctly predicted non-COVID-19 pneumonia cases from the images.
The False Positive (FP) is the incorrect predicted COVID-19 pneumonia cases from the images.
The True Positive (TP) is the correct prediction of COVID-19 cases from the images.
The False Negative (FN) is the incorrect predicted non-COVID-19 pneumonia cases/image.

IV. RESULTS
In this section, the outcome of the algorithm performance is presented. Also, the measurable metrics for its evaluation are discussed. Metrics like F1-score, Precision, Recall, and Accuracy were calculated, where Tab. III, Presents these parameters. Compared to the several results of the COVID-19 dataset in previous studies, we have attained a decent performance enhancement. The validation accuracy we got in the validation process was 96.67%. The Gradual decrease in loss percentage was noted as shown in Fig 4, where the orange and blue line are showing the validation loss and the training loss respectively.