Audio to spectrogram python

Содержание

Compute and Display Audio Mel-spectrogram in Python – Python Tutorial
librosa.feature.melspectrogram()
Read a wav file
Compute Mel-spectrogram
Display Mel-spectrogram
Saved searches
Use saved searches to filter your results more quickly
License
iamvishnuks/Audio2Spectrogram
Name already in use
Sign In Required
Launching GitHub Desktop
Launching GitHub Desktop
Launching Xcode
Launching Visual Studio Code
Latest commit
Git stats
Files
README.md
About
Saved searches
Use saved searches to filter your results more quickly
License
alakise/Audio-Spectrogram
Name already in use
Sign In Required
Launching GitHub Desktop
Launching GitHub Desktop
Launching Xcode
Launching Visual Studio Code
Latest commit
Git stats
Files
README.md
About

Compute and Display Audio Mel-spectrogram in Python – Python Tutorial

Audio mel-spectrogram is a classic feature for deep learning. In this tutorial, we will introduce how to get and display it using python.

librosa.feature.melspectrogram()

This function can compute a mel-scaled spectrogram.

librosa.feature.melspectrogram(*, y=None, sr=22050, S=None, n_fft=2048, hop_length=512, win_length=None, window='hann', center=True, pad_mode='constant', power=2.0, **kwargs)

Here are some important parameters:

y : the audio data, it may (,n) shape.

sr : the audio sample rate.

hop_length : number of samples between successive frames. It will affect the result.

win_length : Each frame of audio is windowed by window()

From the source code, we can find the relation between hop_length and win_length is:

# By default, use the entire frame if win_length is None: win_length = n_fft # Set the default hop, if it's not already specified if hop_length is None: hop_length = int(win_length // 4) fft_window = get_window(window, win_length, fftbins=True)

We will use an example to explain this function.

Read a wav file

import librosa import numpy as np audio_file =r'D:\1481637021654134785_sep.wav' audio_data, sr = librosa.load(audio_file, sr= 8000, mono=True) print(audio_data.shape)

In this example code, we use librosa.load() to read audio data. Here is the detail.

Run this code, we will get:

It means the sample poit is 182015 in this file.

Compute Mel-spectrogram

We will use librosa.feature.melspectrogram() to compute mel-spectrogram. Here is an example:

melspectrum = librosa.feature.melspectrogram(y=audio_data, sr=sr, hop_length= 512, window='hann', n_mels=256) print(melspectrum.shape)

Run this code, we will get:

If we change parameters hop_length and n_mels , how about the result?

melspectrum = librosa.feature.melspectrogram(y=audio_data, sr=sr, hop_length= 200, window='hann', n_mels=128) print(melspectrum.shape) #(128, 911)

The result will be 128*911 .

From above we can find: the mel-spectrogram is a matrix. It is:

[n_mels, len(audio_data)//hop_length +1]

For example, if n_mels = 128, hop_length = 200,

len(audio_data)//hop_length +1 = 182015//200 + 1 = 911.

Display Mel-spectrogram

When we have computed Mel-spectrogram, we can display it. Here is an example:

import matplotlib.pyplot as plt import librosa.display fig, ax = plt.subplots() S_dB = librosa.power_to_db(melspectrum, ref=np.max) img = librosa.display.specshow(S_dB, x_axis='time', y_axis='mel', sr=sr, ax=ax) fig.colorbar(img, ax=ax, format='%+2.0f dB') ax.set(title='Mel-frequency spectrogram') plt.show()

As to function: librosa.display.specshow() shoud be same to librosa.feature.melspectrogram() .

So we should set hop_length = 512 , then run this code, we will get an image as follows:

Источник

Saved searches

Use saved searches to filter your results more quickly

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

This tool can be used to convert mp3 to processable wav files, generate chunks of wav’s and generate spectrograms.

License

iamvishnuks/Audio2Spectrogram

This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Please sign in to use Codespaces.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

Git stats

Files

Failed to load latest commit information.

README.md

This tool can be used to convert mp3 to processable wav files, generate chunks of wav’s and generate spectrograms.

For this tool to work properly you need to install following packages in your machine:

This packages has been installed and tested on Ubuntu 16.04.

After installing above packages using apt, install python packages required by this tool by running below command.

pip install -r requirements.txt

or you can just install it by pip install audio2spectrogram

If you set mp3towav flag then it will convert all your mp3 file in specified directory to wav, if you set mkchunks flag then it will cut the wavfile into different 10 seconds files and if you have set spectrogram flag then it will convert all wav files to it’s spectrogram.

python Audio2Spectrogram.py —mp3towav —mkchunks —spectrogram

audio2spectrogram —mp3towav —mkchunks —spectrogram

It will convert mp3 to wav, create chunks and generate spectrograms.

About

This tool can be used to convert mp3 to processable wav files, generate chunks of wav’s and generate spectrograms.

Источник

Saved searches

Use saved searches to filter your results more quickly

Generating sound spectrograms using short-time Fourier transform that can be used for purposes such as sound classification by machine learning algorithms.

License

alakise/Audio-Spectrogram

This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Name already in use

Please sign in to use Codespaces.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

Git stats

Files

Failed to load latest commit information.

README.md

Generating sound spectrograms using short-time Fourier transform that can be used for purposes such as sound classification by machine learning algorithms.

I needed an audio spectrogram generator for a machine learning algorithm I wanted to produce, but all the codes I encountered were missing, old or incorrect.

The problem I encountered in all the codes, the output of the code they wrote was always at a standard resolution. I’ve arranged the output resolution to be equal to the maximum resolution that the audio file can provide, making it the best fit for analysis(not sure). I also standardized the expression of sound intensity as dBFS.

Audio files are actually records of periodic sampling of the sound levels of frequencies. I want to touch on these notions first.

Differences in signal or sound levels are measured in decibels (dB). So a measurement of 10 dB means that one signal or sound is 10 decibels louder than another. It is a relative scale.

It is a common error to say that, for instance, the sound level of a human speech is 50-60 dB. The level of the human speech is 50-60 dB SPL(Sound Pressure Level), where 0 dB SPL is the reference level. 0 dB SPL is the hearing limit of average person, anything quieter would be imperceptible. But dB SPL relates only to actual sound, not to signals.

I’ve scaled plot to dbFS, FS stands for ‘Full Scale’ and 0 dBFS is the highest signal level achievable in a digital audio file and all levels in audio files relative to this value.

Audio frequency is the speed of the sound’s vibration which determines the pitch of the sound. Even if you are not familiar with audio processing, this notion widely known.

The sampling rate is the number of times per second that the amplitude of the signal is measured and so has dimensions of samples per second. So, if you divide the total number of samples in the audio file by the sampling rate of the file, you will find the total duration of the audio file. For further information about sampling rate search for “Nyquist-Shannon Sampling Theorem”

Fourier Transform and Short Time Fourier Transform If we evaluate a sound wave as a time-volume graph, we cannot obtain information about the frequency domain, and if we apply a Fourier transform to this wave, it loses its time domain. In short, time represention obfuscates frequency and frequency represention obfuscates time. Therefore, a meaningful image in terms of frequency, sound intensity and time can be obtained by applying the Fourier transform to the short interval parts of the sound data called short time Fourier transform.

The code is tested using SciPy 1.3.1, NumPy 1.17.0, Matplotlib 3.1.1 under Windows 10 with Python 3.7 and Python 3.5. Similiar versions of those libraries probably works. Only supports mono 16bit 44.1kHz .wav files. But it is easy to convert audio files using certain websites.

You can run the code on the command line using:

python spectrogram.py "examples/1kHz-20dbFS.wav" l # opens labelled output in window python spectrogram.py "examples/1kHz-20dbFS.wav" ls # saves labelled python spectrogram.py "examples/1kHz-20dbFS.wav" s # saves unlabelled output python spectrogram.py "examples/1kHz-20dbFS.wav" # opens unlabelled output in window

The third argument passed on the command line can take two letters: ‘l’ for labelled, and ‘s’ for save. Set your output folder in code.

Tested with audio files in «examples» folder.

By committing your code to the this repository you agree to release the code under the MIT License attached to the repository.

About

Generating sound spectrograms using short-time Fourier transform that can be used for purposes such as sound classification by machine learning algorithms.

Источник

Audio to spectrogram python

Compute and Display Audio Mel-spectrogram in Python – Python Tutorial

librosa.feature.melspectrogram()

Read a wav file

Compute Mel-spectrogram

Display Mel-spectrogram

Saved searches

Use saved searches to filter your results more quickly

License

iamvishnuks/Audio2Spectrogram

Name already in use

Sign In Required

Launching GitHub Desktop

Launching GitHub Desktop

Launching Xcode

Launching Visual Studio Code

Latest commit

Git stats

Files

README.md

About

Saved searches

Use saved searches to filter your results more quickly

License

alakise/Audio-Spectrogram

Name already in use

Sign In Required

Launching GitHub Desktop

Launching GitHub Desktop

Launching Xcode

Launching Visual Studio Code

Latest commit

Git stats

Files

README.md

About