Speech emotion recognition python

Saved searches

Use saved searches to filter your results more quickly

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Speech Emotion Detection using SVM, Decision Tree, Random Forest, MLP, CNN with different architectures

License

PrudhviGNV/Speech-Emotion-Recognization

This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Sign In Required

Please sign in to use Codespaces.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

Git stats

Files

Failed to load latest commit information.

README.md

Speech Emotion Recognition using machine learning

python-mini-project-speech-emotion-recognition-1280x720

    This project is completely based on machine learning and deep learning where we train the models with RAVDESS Dataset which consists of audio files which are labelled with basic emotions.

This project is not just about to predict emotion based on the speech. and also to perform some analytical research by applying different machine learning algorithms and neural networks with different architectures.Finally compare and analyse their results and to get beautiful insights.

As human beings speech is amongst the most natural way to express ourselves.As emotions play a vital role in communication, the detection and analysis of the same is of vital importance in today’s digital world of remote communication. Emotion detection is a challenging task, because emotions are subjective. There is no common consensus on how to measure or categorize them.

check out my Medium blog for quick intuition and understanding

The models which were discussed in the repository are MLP,SVM,Decision Tree,CNN,Random Forest and neural networks of mlp and CNN with different architectures.

utilities.py — Contains extraction of features,loading dataset functions

loading_data.py — Contains dataset loading,splitting data

mlp_classifier_for_SER.py — Contains mlp model code

SER_using_ML_algorithms.py — Contains SVM,randomforest,Decision tree Models.

Speech_Emotion_Recognition_using_CNN.ipynb — Consists of CNN-1d model

NOTE : Remaining .ipynb files were same as above files but shared from google colab.

In this project, I use RAVDESS dataset to train.

s1

2452 audio files, with 12 male speakers and 12 Female speakers, the lexical features (vocabulary) of the utterances are kept constant by speaking only 2 statements of equal lengths in 8 different emotions by all speakers. This dataset was chosen because it consists of speech and song files classified by 247 untrained Americans to eight different emotions at two intensity levels: Calm, Happy, Sad, Angry, Fearful, Disgust, and Surprise, along with a baseline of Neutral for each actor.

protip : if you are using google colabs. Use kaggle API to extract data from kaggle with super fast and with super ease 🙂

The heart of this project lies in preprocessing audio files. If you are able to do it . 70 % of project is already done. We take benefit of two packages which makes our task easier. — LibROSA — for processing and extracting features from the audio file. — soundfile — to read and write audio files in the storage.

The main story in preprocessing audio files is to extract features from them.

  • MFCC (mfcc)
  • Chroma (chroma)
  • MEL Spectrogram Frequency (mel)
  • Contrast (contrast)
  • Tonnetz (tonnetz)

In this project, code related to preprocessing the dataset is written in two functions.

load_data() is used to traverse every file in a directory and we extract features from them and we prepare input and output data for mapping and feed to machine learning algorithms. and finally, we split the dataset into 80% training and 20% testing.

def load_data(test_size=0.2): X, y = [], [] try : for file in glob.glob("/content/drive/My Drive/wav/Actor_*/*.wav"): # get the base name of the audio file basename = os.path.basename(file) print(basename) # get the emotion label emotion = int2emotion[basename.split("-")[2]] # we allow only AVAILABLE_EMOTIONS we set if emotion not in AVAILABLE_EMOTIONS: continue # extract speech features features = extract_feature(file, mfcc=True, chroma=True, mel=True) # add to data X.append(features) y.append(emotion) except : pass # split the data to training and testing and return it return train_test_split(np.array(X), y, test_size=test_size, random_state=7)

Below is the code snippet to extract features from each file.

def extract_feature(file_name, **kwargs): """ Extract feature from audio file `file_name` Features supported: - MFCC (mfcc) - Chroma (chroma) - MEL Spectrogram Frequency (mel) - Contrast (contrast) - Tonnetz (tonnetz) e.g: `features = extract_feature(path, mel=True, mfcc=True)` """ mfcc = kwargs.get("mfcc") chroma = kwargs.get("chroma") mel = kwargs.get("mel") contrast = kwargs.get("contrast") tonnetz = kwargs.get("tonnetz") with soundfile.SoundFile(file_name) as sound_file: X = sound_file.read(dtype="float32") sample_rate = sound_file.samplerate if chroma or contrast: stft = np.abs(librosa.stft(X)) result = np.array([]) if mfcc: mfccs = np.mean(librosa.feature.mfcc(y=X, sr=sample_rate, n_mfcc=40).T, axis=0) result = np.hstack((result, mfccs)) if chroma: chroma = np.mean(librosa.feature.chroma_stft(S=stft, sr=sample_rate).T,axis=0) result = np.hstack((result, chroma)) if mel: mel = np.mean(librosa.feature.melspectrogram(X, sr=sample_rate).T,axis=0) result = np.hstack((result, mel)) if contrast: contrast = np.mean(librosa.feature.spectral_contrast(S=stft, sr=sample_rate).T,axis=0) result = np.hstack((result, contrast)) if tonnetz: tonnetz = np.mean(librosa.feature.tonnetz(y=librosa.effects.harmonic(X), sr=sample_rate).T,axis=0) result = np.hstack((result, tonnetz)) return result 

Let’s drive further into the project ..

Traditional Machine Learning Models:

Performs different traditional algorithms such as -Decision Tree, SVM, Random forest .

Finds that these algorithms don’t give satisfactory results. So Deep Learning comes into action.

implements classical neural network architecture such as mlp Refer

Found that Deep learning algorithms like mlp tends to overfit to the data. So the preferred neural network is CNN which is a game changer in many fields and applications. Wants to perform some analysis to find the best CNN architecture for available dataset. Here CNN with different architectures is trained against the dataset and the accuracy is recorded. Here every architecture has same configuration and is trained to 500 epochs.

for better understanding about data and also for visualizing waveform and spectogram of audio files. Refer

Neural networks performs better than traditional classical machine learning models in maximun cases ( by compare metrics)

Since Deep learning models are data hunger .. They tend overfit the training data. (if we keep on training the model. we get 95% + accuracy 🙂 )

CNN architectures performs better than traditional neural network architectures. (cnn in most cases perform better than mlp under same configuration)

CNN with different architectures with same configuration , with same learning rate, with same number of epochs also have vast difference in the accuracy (from Speech_Emotion_Recognition_using_CNN.ipynb )

Hope you like this project 🙂

About

Speech Emotion Detection using SVM, Decision Tree, Random Forest, MLP, CNN with different architectures

Источник

Saved searches

Use saved searches to filter your results more quickly

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Speech emotion recognition implemented in Keras (LSTM, CNN, SVM, MLP) | 语音情感识别

License

Renovamen/Speech-Emotion-Recognition

This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Sign In Required

Please sign in to use Codespaces.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

Git stats

Files

Failed to load latest commit information.

README.md

Speech Emotion Recognition

用 LSTM、CNN、SVM、MLP 进行语音情感识别,Keras 实现。

改进了特征提取方式,识别准确率提高到了 80% 左右。原来的版本的存档在 First-Version 分支。

├── models/ // 模型实现 │ ├── common.py // 所有模型的基类 │ ├── dnn // 神经网络模型 │ │ ├── dnn.py // 所有神经网络模型的基类 │ │ ├── cnn.py // CNN │ │ └── lstm.py // LSTM │ └── ml.py // SVM & MLP ├── extract_feats/ // 特征提取 │ ├── librosa.py // librosa 提取特征 │ └── opensmile.py // Opensmile 提取特征 ├── utils/ │ ├── files.py // 用于整理数据集(分类、批量重命名) │ ├── opts.py // 使用 argparse 从命令行读入参数 │ └── plot.py // 绘图(雷达图、频谱图、波形图) ├── config/ // 配置参数(.yaml) ├── features/ // 存储提取好的特征 ├── checkpoints/ // 存储训练好的模型权重 ├── train.py // 训练模型 ├── predict.py // 用训练好的模型预测指定音频的情感 └── preprocess.py // 数据预处理(提取数据集中音频的特征并保存) 
  • TensorFlow 2 / Keras:LSTM & CNN ( tensorflow.keras )
  • scikit-learn:SVM & MLP 模型,划分训练集和测试集
  • joblib:保存和加载用 scikit-learn 训练的模型
  • librosa:提取特征、波形图
  • SciPy:频谱图
  • pandas:加载特征
  • Matplotlib:绘图
  • NumPy
  1. RAVDESS 英文,24 个人(12 名男性,12 名女性)的大约 1500 个音频,表达了 8 种不同的情绪(第三位数字表示情绪类别):01 = neutral,02 = calm,03 = happy,04 = sad,05 = angry,06 = fearful,07 = disgust,08 = surprised。
  2. SAVEE 英文,4 个人(男性)的大约 500 个音频,表达了 7 种不同的情绪(第一个字母表示情绪类别):a = anger,d = disgust,f = fear,h = happiness,n = neutral,sa = sadness,su = surprise。
  3. EMO-DB 德语,10 个人(5 名男性,5 名女性)的大约 500 个音频,表达了 7 种不同的情绪(倒数第二个字母表示情绪类别):N = neutral,W = angry,A = fear,F = happy,T = sad,E = disgust,L = boredom。
  4. CASIA 汉语,4 个人(2 名男性,2 名女性)的大约 1200 个音频,表达了 6 种不同的情绪:neutral,happy,sad,angry,fearful,surprised。

Источник

Читайте также:  Resource java относительный путь
Оцените статью