Handwritten Character Recognition With Own Dataset

Introduction

We train the machine to recognize handwritten Alphabets. We prepared our own dataset for the project. This trained machine learning model can be then used in various other applications like word recognition, English OCR etc. All the relevant code can be found Here.This was the first Mini-Project that I did in my Bachelor’s. This project was done under the guidance of Mr. Amit Patel.

Image

The Data for the Task

This data has been collected and segmented manually by my team. This was collected from all our classmates writing the alphabets in different styles. We used MATLAB to do the segmentation of individual alphabets and manually sorted every image into folders labeled by its corresponding alphabet.

The Structure of the Data

[English_alphabets]
            |
            |____[A]
            |    |
            |    |__1765.jpg
            |    |__1764.jpg
            |
            |____[B]
            |
            |
            |____[Z]

The General Approach

The approach to solve the problem was pretty simple.

  • We load in the each image as an array of (height x width) dimensions. this will result in many rows with the array data and the data is labeled according to the folder it belonged to. The grayness values in each pixel is taken as observations in this task and thus we build a dataset ready to be modeled.
  • Taking all the dimensions might be of little to no use to us, so we do a dimentionality reduction on the data. I used Principal Component Analysis (PCA) for this. PCA
  • We can have a better understanding of the data with T-SNE TSNE
  • That’s it the data is now ready to be modeled. I used SVM and logistic regression to classify back then, later I even used a MLP to train the model. Out of all those obviously the MLP performed better, but SVm also gave considerable results.

accuracy

If you want to tinker around with the notebook Here is the kaggle notebook. You can test our your own theories on it. Feel free to use the dataset as well.