Deploying Audio Classification Flask App (Part-II)

Sayan Chakraborty
3 min readJul 16, 2021

--

In Part-I, I’ve demonstrated the data gathering, preparation and modeling process along with explaining how phonemes can be extracted from speech.

In this part, I’ll go through the sample Python Flask app that helped the patient identify his lost voice.

Solution Aspects —

  • The solution should be hosted in single web server with microphone recording facility.
  • It should be single page AJAX based solution to record and display words.
  • Pertained model to be used. Incremental training may be required if patient constantly change his/her speech.
  • Patients Bag Of Words to be used while predicting words
  • Probabilistic calculations to be used while predicting words
  • First 2 most probable letters to be considered while recording voice

Solution -

For Flask webapp with AJAX page -

Load the CNN model while initializing the app:

Start Recording and Preprocessing of sound

noise reduction, equal chunks,

Find MFCC features of each chunk and calculate predicted probabilities based on pre-trained model -

Bag Of Words:

The idea here is — patients have their own vocabulary with Term Frequency ordering. I’ve used most frequent 20K English dictionary words for this application but best result can be found if we can restrict the BOW list to 1000. Normally paralyzed patients don’t participate on world politics or so. They are concerned about their daily routine that can be bounded to 1000 words model.

After fixed time recording, we identify first 2 most probable letters for each phonetics. Remember we predict the letters with 26 letters Softmax function, so for better probabilistic calculation, it’s sometimes better to take close calls.

Then , we’ll match certain percentage (50%) of match letters with BOW list with all combinations.

Possible words of recording

GitHub:

--

--

Sayan Chakraborty
Sayan Chakraborty

Written by Sayan Chakraborty

Enterprise Architect with delivering solutions in AI/ML/Cloud/Big Data space

No responses yet