Emotion Recognition- Machine Learning

Emotion Predictor
Technologies
ETL
Test & Train
Visualization
Learnings
Run Flask App
Heroku
Resources
Contact

Emotion Predictor

What is the Emotions Predictor?

Emotions Predictor is an application that allows users to record, and playback short audio file and its built-in super cool Machine Learning model will predict the range of emotions and the gender of the audio clip. There are options to compare the frequencies and other attributes for various emotions to see how they show up on a scale. Why Emotions Predictor? Empathy is important because it helps us understand how others are feeling so we can respond appropriately to the situation. Not all of us are born as manipulators, fortune-tellers or psychics with excellent empathetic skills. Also there are people who have certain disabilities understanding emotions or having below conditions like:

A person who has difficulty identifying and expressing emotions
People having trouble identifying social cues
People who are hard of hearing

Emotions predictor, can come in handy for Interpreting the emotions in the voicemails, FBI recordings, law disputes, alien interactions, etc.

Emotions include:

02 = calm
03 = happy
04 = sad
05 = angry
06 = fearful
07 = disgust
08 = surprised

Technologies

Machine Learning
Jupyter Notebook / Pandas
Javascript
Flask App
D3
HTML / CSS
Tableau

Extract Transform Load

Data from CSV files of audio voices.
Used librosa package to convert audio files into 128 Features including low-level feature extraction, such as chromograms, Mel spectrogram, MFCC, and various other spectral and rhythmic features
Used Pandas to provide the feature data for emotions and gender as input to the models
Tested RandomForestClassifier, KNeighborsClassifier , Keras Deep Learning, and Linear Regression to find the most accurate model.
Developed a record and playback functionality - the output of which could be read a model for predicting the emotions and the gender of the recorded audio
Sample pre-recorded test clips were given as input to the models and emotions were predicted successfully.

Test Train

Using the Labrosa library we inputted the wave. Files thru Labrosa to parse the file into 128 Mel-frequency cepstral coefficients (MFCCs) features. MFCCs represent the audio clip as a nonlinear "spectrum-of-a-spectrum". In an MFC, the frequency bands are equally spaced on the scale, which approximates the human auditory system.

This process allowed the audio file to be represented by data that was then able to be fed into various machine learning models for testing and training. We then used the same parsing method to take a user’s inputted file and break it down into MFCC data to be used for the model to predict the makeup of emotions in the person’s voice.

Through the models it was determined that for our purpose Random Forrest Classifier produced the most accurate prediction and was the model that was not overtrained

Emotion Random Forest

Gender Random Forest

Emotion Deep Learning

Gender Deep Learning

KNN Model

Accuracy

Visualization

The Emotion Predictor runs as a client-side flask application, in its current edition it does not need nor contain a database. If we were to expand the project and use the user inputted files to be stored used as additional training inputs for the model a database would then be needed.

The application works by using the built-in functionality of HTML5 to allow the users browser to record and store the audio file, the file once recorded to passed into the FLASK app using the POST method where the file can then be run thru the audio parser, breaking it into features and then thru the model. The application uses two different models, one for emotion and one for Male/Female, this production as well as the probability of each emotion and sex is then passed into a JSON file as a dictionary that is used to generate the PLOTLY bar chart of emotions.

Test Clips

A similar function is used on the Test clips page, but the data is stored as a JSON file to avoid lag in the application attempting to call 8 audio files in a row to build the dictionary on each session. Instead, the page calls a stored dictionary and recalls the specific values for each of the 8 bootstrap cards, allowing for. A much more seem less user experience.

Sound cards made with bootstrap cards

Alexis Bar Chart

Emotion Visualizations made using Pandas

Tableau

Learnings

Model Accuracy

The gender data contained more female data than male when we combined both the datasets. The emotion data originally included neutral, calm, happy, sad, fearful, angry, disgust and surprise. We combined neutral and calm because they were similar.
This was causing our model to predict female, and calm more than it should. We took steps to remove extra female data and eliminated calm from our data. This resulted in a more accurate model. Predicting Gender
Even with an accurate model, it is still difficult to predict someone’s gender based on a person’s voice.
The only difference between the male and female larynx is size
Several investigators have argued that a comprehensive understanding of gender differences in vocal emotion recognition can only be achieved by replicating these studies while accounting for influential factors such as stimulus type, gender-balanced samples, number of encoders, decoders, and emotional categories.

Run Flask App

To Deploy our Flask App, please follow the below steps:

step 1: Git clone our repository into your local
from the folder in your terminal, type python app.py to launch site

Heroku

Emotion Predictor

Resources

RAVDESS Dataset: "The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS)" by Livingstone & Russo is licensed under CC BY-NA-SC 4.0 Link

[TESS Dataset: Pichora-Fuller, M. Kathleen; Dupuis, Kate, 2020, "Toronto emotional speech set (TESS)", Link, Scholars Portal Dataverse, V1

Link

HTML Template

Google Doc Presentation