Python opus to wav

Содержание

ftransc 7.0.3
Навигация
Ссылки проекта
Статистика
Метаданные
Сопровождающие
Классификаторы
Описание проекта
What is ftransc
Installing ftransc
Examples
ftransc Quality Presets
ftransc Metadata Tags
Screenshots
ftransc plugin for Rhythmbox media player:
Convert ogg byte array to wav byte array Python
1 Answer 1
How to convert speech to text in python – opus file format
How to convert speech to text in python – opus file format
Solution – 1
Solution

ftransc 7.0.3

ftransc is a python library for converting audio files across various formats.

Ссылки проекта

Статистика

Метаданные

Метки Audio, Convert, ffmpeg, avconv, mp3

Сопровождающие

Классификаторы

Описание проекта

What is ftransc

ftransc is the python audio conversion library. It can convert local files or files from youtube (even youtube playlists).

Installing ftransc

ftransc can be installed as follows:

Then FFMpeg must also installed as follows:

 sudo apt-get install ffmpeg lame flac vorbis-tools

Examples

Example 1 — converting from MP3 to OGG:

 ftransc -f ogg filename.mp3

The output file name for the above example will be ‘filename.ogg’

Example 2 — converting from MP3 to AAC, removing original file on success, using high quality preset:

 ftransc -r -q extreme -f aac filename.mp3

Example 3 — extract audio content from a video files into the MP3 format, use best quality preset:

 ftransc -q insane -f mp3 filename2.avi filename3.mpg filename4.vob .

Example 4 — convert all audio files inside a given folder into WMA format. (This option is not recursive to child-folders)

 ftransc -f wma --directory /path/to/folder_name

Example 5 — convert all audio audio files (and extract all audio content from video files) inside a given folder recursively including all sub-/child-folders, ftransc should be used with the ‘find’ command in the pipeline as follows:

 find /path/to/folder_name -type f -print0 | xargs -0 ftransc -f aac -q high

ftransc Quality Presets

ftransc uses quality presets called ‘insane’, ‘extreme’, ‘high’, ‘normal’, ‘low’, and ‘tiny’. These presets are specified by the ‘-q’ or ‘—quality’ option of ftransc and are defined in the ‘/etc/ftransc/presets.conf’ configuration file.

The /etc/ftransc/presets.conf presets file can be overriden by use of the —presets option and specify the custom presets file to use or, if you know what you are doing, make changes directly on the it.

ftransc Metadata Tags

The following is the list of supported tags across audio formats that ftransc can encode to. N means the tag is not supported and hence is lost during conversion. Y means the tag is supported and is present on the new file after conversion:

tag	m4a	mp3	ogg	flac	wma	mpc	wav	wv
title	Y	Y	Y	Y	Y	Y	N	Y
artist	Y	Y	Y	Y	Y	Y	N	Y
album	Y	Y	Y	Y	Y	Y	N	Y
genre	Y	Y	Y	Y	Y	Y	N	Y
date	Y	Y	Y	Y	Y	Y	N	Y
tracknumber	Y	Y	Y	Y	Y	Y	N	Y
composer	Y	Y	Y	Y	Y	Y	N	N
publisher	N	Y	N	N	Y	N	N	N
lyrics	Y	Y	N	N	Y	N	N	N
album art	Y	Y	N	Y	N	N	N	N
album artist	N	N	N	N	N	N	N	N
comment	N	N	N	N	N	N	N	N

Screenshots

The image below shows ftransc command in action on Terminal as well as the ftransc manpage ( man ftransc ):

ftransc GUI front-end, ftransc_qt:

ftransc also uses Nautilus Scripts, so you can right click selection of files and convert like:

ftransc plugin for Rhythmbox media player:

The ftransc plugin for rhythmbox media player allows one to send files from Rhythmbox music player to ftransc for conversion.

Enabling the plugin:

Converting songs with ftransc from Rhythmbox

Источник

Convert ogg byte array to wav byte array Python

I want to convert ogg byte array/bytes with Opus codec to wav byte array/bytes without saving to disk. I have downloaded audio from telegram api and it is in byte array format with .ogg extension. I do not want to save it to filesystem to eliminate filesystem io latencey. Currently what I am doing is after saving the audio file in .ogg format using code the below code using telegram api for reference https://docs.python-telegram-bot.org/en/stable/telegram.file.html#telegram.File.download_to_drive

# listen for audio messages async def audio(update, context): newFile = await context.bot.get_file(update.message.voice.file_id) await newFile.download_to_drive(output_path)

subprocess.call(["ffmpeg", "-i", output_path, output_path.replace(".ogg", ".wav"), '-y'], stderr=subprocess.DEVNULL, stdout=subprocess.DEVNULL)

async def audio(update, context): newFile = await context.bot.get_file(update.message.voice.file_id) byte_array = await newFile.download_as_bytearray()

to get byte_array and now I want this byte_array to be converted to wav without saving to disk and without using ffmpeg. Let me know in comments if something is unclear. Thanks! Note: I have setted up a telegram bot at the backend which listens for audios sent to private chat which I do manually for testing purposes.

Edit your question with a solution that writes the data to the disk. Please check the codec of the OGG container.

I am not writing the data to disk simply fetching the byte array using telegram api. The codec of OGG container is Opus that I found using ffprobe

1 Answer 1

We may write the OGG data to FFmpeg stdin pipe, and read the encoded WAV data from FFmpeg stdout pipe.
My following answer describes how to do it with video, and we may apply the same solution to audio.

The example assumes that the OGG data is already downloaded and stored in bytes array (in the RAM).

 -------------------- Encoded --------- Encoded ------------ | Input OGG encoded | OGG data | FFmpeg | WAV data | Store to | | stream | ----------> | process | ----------> | BytesIO | -------------------- stdin PIPE --------- stdout PIPE -------------

The implementation is equivalent to the following shell command:
Linux: cat input.ogg | ffmpeg -y -f ogg -i pipe: -f wav pipe: > test.wav
Windows: type input.ogg | ffmpeg -y -f ogg -i pipe: -f wav pipe: > test.wav

The example uses ffmpeg-python module, but it’s just a binding to FFmpeg sub-process (FFmpeg CLI must be installed, and must be in the execution path).

Execute FFmpeg sub-process with stdin pipe as input and stdout pipe as output:

ffmpeg_process = ( ffmpeg .input('pipe:', format='ogg') .output('pipe:', format='wav') .run_async(pipe_stdin=True, pipe_stdout=True) )

The input format is set to ogg , the output format is set to wav (use default encoding parameters).

Assuming the audio file is relatively large, we can’t write the entire OGG data at once, because doing so (without «draining» stdout pipe) causes the program execution to halt.

We may have to write the OGG data (in chunks) in a separate thread, and read the encoded data in the main thread.

Here is a sample for the «writer» thread:

def writer(ffmpeg_proc, ogg_bytes_arr): chunk_size = 1024 # Define chunk size to 1024 bytes (the exacts size is not important). n_chunks = len(ogg_bytes_arr) // chunk_size # Number of chunks (without the remainder smaller chunk at the end). remainder_size = len(ogg_bytes_arr) % chunk_size # Remainder bytes (assume total size is not a multiple of chunk_size). for i in range(n_chunks): ffmpeg_proc.stdin.write(ogg_bytes_arr[i*chunk_size:(i+1)*chunk_size]) # Write chunk of data bytes to stdin pipe of FFmpeg sub-process. if (remainder_size > 0): ffmpeg_proc.stdin.write(ogg_bytes_arr[chunk_size*n_chunks:]) # Write remainder bytes of data bytes to stdin pipe of FFmpeg sub-process. ffmpeg_proc.stdin.close() # Close stdin pipe - closing stdin finish encoding the data, and closes FFmpeg sub-process.

The «writer thread» writes the OGG data in small chucks.
The last chunk is smaller (assume the length is not a multiple of chuck size).

At the end, stdin pipe is closed.
Closing stdin finish encoding the data, and closes FFmpeg sub-process.

In the main thread, we are starting the thread, and read encoded «WAV» data from stdout pipe (in chunks):

thread = threading.Thread(target=writer, args=(ffmpeg_process, ogg_bytes_array)) thread.start() while thread.is_alive(): wav_chunk = ffmpeg_process.stdout.read(1024) # Read chunk with arbitrary size from stdout pipe out_stream.write(wav_chunk) # Write the encoded chunk to the "in-memory file".

For reading the remaining data, we may use ffmpeg_process.communicate() :

# Read the last encoded chunk. wav_chunk = ffmpeg_process.communicate()[0] out_stream.write(wav_chunk) # Write the encoded chunk to the "in-memory file".

import ffmpeg import base64 from io import BytesIO import threading async def download_audio(update, context): # The method is not not used - we are reading the audio from as file instead (just for testing). newFile = await context.bot.get_file(update.message.voice.file_id) bytes_array = await newFile.download_as_bytearray() return bytes_array # Equivalent Linux shell command: # cat input.ogg | ffmpeg -y -f ogg -i pipe: -f wav pipe: > test.wav # Equivalent Windows shell command: # type input.ogg | ffmpeg -y -f ogg -i pipe: -f wav pipe: > test.wav # Writer thread - write the OGG data to FFmpeg stdin pipe in small chunks of 1KBytes. def writer(ffmpeg_proc, ogg_bytes_arr): chunk_size = 1024 # Define chunk size to 1024 bytes (the exacts size is not important). n_chunks = len(ogg_bytes_arr) // chunk_size # Number of chunks (without the remainder smaller chunk at the end). remainder_size = len(ogg_bytes_arr) % chunk_size # Remainder bytes (assume total size is not a multiple of chunk_size). for i in range(n_chunks): ffmpeg_proc.stdin.write(ogg_bytes_arr[i*chunk_size:(i+1)*chunk_size]) # Write chunk of data bytes to stdin pipe of FFmpeg sub-process. if (remainder_size > 0): ffmpeg_proc.stdin.write(ogg_bytes_arr[chunk_size*n_chunks:]) # Write remainder bytes of data bytes to stdin pipe of FFmpeg sub-process. ffmpeg_proc.stdin.close() # Close stdin pipe - closing stdin finish encoding the data, and closes FFmpeg sub-process. if False: # We may assume that ogg_bytes_array is the output of download_audio method ogg_bytes_array = download_audio(update, context) else: # The example reads the decode_string from a file (for testing"). with open('input.ogg', 'rb') as f: ogg_bytes_array = f.read() # Execute FFmpeg sub-process with stdin pipe as input and stdout pipe as output. ffmpeg_process = ( ffmpeg .input('pipe:', format='ogg') .output('pipe:', format='wav') .run_async(pipe_stdin=True, pipe_stdout=True) ) # Open in-memory file for storing the encoded WAV file out_stream = BytesIO() # Starting a thread that writes the OGG data in small chunks. # We need the thread because writing too much data to stdin pipe at once, causes a deadlock. thread = threading.Thread(target=writer, args=(ffmpeg_process, ogg_bytes_array)) thread.start() # Read encoded WAV data from stdout pipe of FFmpeg, and write it to out_stream while thread.is_alive(): wav_chunk = ffmpeg_process.stdout.read(1024) # Read chunk with arbitrary size from stdout pipe out_stream.write(wav_chunk) # Write the encoded chunk to the "in-memory file". # Read the last encoded chunk. wav_chunk = ffmpeg_process.communicate()[0] out_stream.write(wav_chunk) # Write the encoded chunk to the "in-memory file". out_stream.seek(0) # Seek to the beginning of out_stream ffmpeg_process.wait() # Wait for FFmpeg sub-process to end # Write out_stream to file - just for testing: with open('test.wav', "wb") as f: f.write(out_stream.getbuffer())

Источник

How to convert speech to text in python – opus file format

I have some .opus audio files that need to be converted to text in order to run some analytics. I am aware that there is the Python SpeechRecognition package that can do this with .wav files as demonstrated in this tutorial.

Does anyone know how to convert .opus files to text, or convert .opus to .wav ?

I have tried the Python SpeechRecognition package with no success.

Solution – 1

Here is a solution which employs ffmpeg and the os library to first convert all .opus files in the specified directory to .wav , and then perform speech recognition on the resulting .wav files using the speech_recognition module:

Solution

import os import speech_recognition as sr path = './audio-files/' file_type_to_convert = ".opus" file_type_to_recognize = ".wav" for filename in os.listdir(path): if filename.endswith(file_type_to_convert): os.system("ffmpeg -i "<>" -vn "<>"".format(path + filename, path + filename[:-len(file_type_to_convert)] + file_type_to_recognize)) recognizer = sr.Recognizer() # Instantiate recognizer rec_output = <> # Create list to store output of speech recognized files # Iterate over each file of specified type to be recognized for file_to_recognize in os.listdir(path): if file_to_recognize.endswith(file_type_to_recognize): audio = sr.AudioFile(path + file_to_recognize) with audio as source: audio_data = recognizer.record(audio) # Recognize & append output # Note: google recognizer is online only, sphinx is the only offline option which uses CMU Sphinx engine rec_output[file_to_recognize[:-len(file_type_to_recognize)]] = recognizer.recognize_google(audio_data, language='en-US') # Display each file's output for key, val in rec_output.items(): print(key) print(val) # Output: # File name # Recognized words in each file

Источник

Читайте также: Php array slice function

Python opus to wav

ftransc 7.0.3

Навигация

Ссылки проекта

Статистика

Метаданные

Сопровождающие

Классификаторы

Описание проекта

What is ftransc

Installing ftransc

Examples

ftransc Quality Presets

ftransc Metadata Tags

Screenshots

ftransc plugin for Rhythmbox media player:

Convert ogg byte array to wav byte array Python

1 Answer 1

How to convert speech to text in python – opus file format

How to convert speech to text in python – opus file format

Solution – 1

Solution