- Saved searches
- Use saved searches to filter your results more quickly
- License
- m00nwtchr/Speech-Recognition-Java
- Name already in use
- Sign In Required
- Launching GitHub Desktop
- Launching GitHub Desktop
- Launching Xcode
- Launching Visual Studio Code
- Latest commit
- Git stats
- Files
- README.md
- About
- Saved searches
- Use saved searches to filter your results more quickly
- License
- viktorvano/SpeechRecognitionAI
- Name already in use
- Sign In Required
- Launching GitHub Desktop
- Launching GitHub Desktop
- Launching Xcode
- Launching Visual Studio Code
- Latest commit
- Git stats
- Files
- README.md
- About
- Saved searches
- Use saved searches to filter your results more quickly
- FreeClimbAPI/Java-Speech-Recognition-Tutorial
- Name already in use
- Sign In Required
- Launching GitHub Desktop
- Launching GitHub Desktop
- Launching Xcode
- Launching Visual Studio Code
- Latest commit
- Git stats
- Files
- README.md
Saved searches
Use saved searches to filter your results more quickly
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.
Java library for: speech recognition, query processing and speech to text in 116 lines!
License
m00nwtchr/Speech-Recognition-Java
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Name already in use
A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Sign In Required
Please sign in to use Codespaces.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching Xcode
If nothing happens, download Xcode and try again.
Launching Visual Studio Code
Your codespace will open once ready.
There was a problem preparing your codespace, please try again.
Latest commit
Git stats
Files
Failed to load latest commit information.
README.md
Java library for: speech recognition, query processing and speech to text in 116 lines!
Yep you heard it right 116 (without comments) lines is the simplest implementation of this library with:
- Speech to text
- Text recognition / query processing and matching
- Text to speech to respond to users!
PS please note that i’m not a native English speaker but i try to make my code as much English-people-friendly as i can
- Java 8 or newer
- Google Speech API Key (only if you want speech to text)
- FFMpeg (optional without it lib can only use LINEAR16 .wav and x-flac .flac files with SpeechToText Note: if on arm FFmpeg integration is disabled because of some weird behaviour i will try to fix that)
How to get google speech api key?
- Go to https://groups.google.com/a/chromium.org/forum/?fromgroups#!forum/chromium-dev and click join
- Go back to https://console.developers.google.com/ create new project or select existing one go to APIs & Services click enable apis and services at the top and search Speech API click the one with grey Private sign and click enable
- Then go to credentials on the left side of the screen now click create credentials and then api key then click close and copy your api key
- You have it you can use it with my library!
You will find examles in examples dir!
About
Java library for: speech recognition, query processing and speech to text in 116 lines!
Saved searches
Use saved searches to filter your results more quickly
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.
Speech recognition AI based on FFNN in Java
License
viktorvano/SpeechRecognitionAI
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Name already in use
A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Sign In Required
Please sign in to use Codespaces.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching Xcode
If nothing happens, download Xcode and try again.
Launching Visual Studio Code
Your codespace will open once ready.
There was a problem preparing your codespace, please try again.
Latest commit
Git stats
Files
Failed to load latest commit information.
README.md
Speech Recognition AI based on FFNN in Java.
This Speech Recognition AI converts speech to text and it can communicate with other applications, servers and hardware.
Tested on Windows and on Linux.
Android App on Google Play Store: IP Mic
- Word Routing: If a phrase is detected, forwards the whole recognized message to another server socket (IP + port)
- Word Commands: If a phrase is detected, another command is send to a server socket (IP + port)
- Word Responses: If a phrase is detected, a proper reply message is send back to the client (IP Mic for Android)
- Webhooks: You can trigger an automation in Home Assistant
- Shell Commands: You can use the same commands like in «command prompt» in Windows or «console» in Linux.
- Transfer Learning: The neural network will retrain itself faster if the wocabulary is changed, because of the previous experience. Or you can simply continue the training of the neural network with new extra data. It also works when the last hidden layers are changed, added or removed.
- High Performance: The neural network runs neurons in each layer in parallel threads for feed forward (speech recognition) and backprop (training).
- Weights Ironing: Set weight to zero of the first hidden layer of those connections which training has not altered.
- Extenf FFT to full FFT with Real and Imaginary values (now there is only an FFT magnitude Real x Imag)
- Change architecture to a multi neural network model with a judge neural network and two step training.
New Training (learning from scratch)
Transfer Learning (learning continuation)
Weights Quicksave Feature
When a user starts to speak, the application starts to record the audio into a buffer.
When a user stops to speak, the application stops recording the audio and splits the recording into individual words which are then analysed by a neural network.
A word is processed afterwards and outputs two normalized characteristics: an outer shell amplitude and a frequency spectrum (FFT).
These are then relayed over the Feed Forward Neural Network.
If a word has significant match, it is added into the output buffer as a text.
The neural network analyses the whole speech word by word.
When all words are analysed then the «word routing» feature steps in and sends the analysed spoken message to individual applications.
After this the application listens again and the whole process repeats.
1.) The application can record 22 seconds of speech.
2.) Words are detected by their amplitude. Alternating amplitude is considered as a word and silence is not a word.
3.) A single word of a phrase can be 2.97 seconds long.
4.) Speech is recognized word by word.
5.) If you want to analyse speech word by word, then you need to separate words with a short break.
It means that you need to speak like a sloth.
6.) If you want to speak more fluently, you can, but the neural network has a 2.97 second word (phrase) buffer.
7.) For a good training data it is recommended to have about several training samples of each word you want to teach the neural network.
8.) It is also recommended to record an audio artifacts (random unwanted noises like chair sounds, typing, clicking. ). They should be named as an empty string «». This way the neural network will learn those sounds and will not be mistaken of a spoken word.
9.) «SpeechRecognitionAI.jar» automatically creates «res» folder in the same location, if it does not exist. In the «res» multiple files will be generated, if they do not exist like: «database.dat», «printToConsole.dat», «topology.dat» and «wordRouting.dat».
Neural Network Visualization Screenshot Examples
Word Examples: «understand», [artifact], «hello», «hi»
(understand) (artifact) (hello) (hi)
About
Speech recognition AI based on FFNN in Java
Saved searches
Use saved searches to filter your results more quickly
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.
Java — Speech Recognition Tutorial
FreeClimbAPI/Java-Speech-Recognition-Tutorial
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Name already in use
A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Sign In Required
Please sign in to use Codespaces.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching Xcode
If nothing happens, download Xcode and try again.
Launching Visual Studio Code
Your codespace will open once ready.
There was a problem preparing your codespace, please try again.
Latest commit
Git stats
Files
Failed to load latest commit information.
README.md
Java — Speech Recognition Tutorial
This project serves as a guide to help you build an application with FreeClimb. View this tutorial on FreeClimb.com. Specifically, the project will:
- Make an outgoing Call
- Prompt the participant for a response
- Host a grammar file for speech recognition
- Use the users response
Setting up your new app within your FreeClimb account
To get started using a FreeClimb account, follow the instructions here.
- Configure environment variables.
ENV VARIABLE DESCRIPTION ACCOUNT_ID Account ID which can be found under API credentials in Dashboard API_KEY API key which can be found under API credentials in Dashboard TUTORIAL_APPLICATION_ID Appliction IDs can be found under Apps HOST The url of where your app is being hosted (e.g. yourHostedApp.com) FREE_CLIMB_PHONE_NUMBER The FreeClimb number that is being used to make a phone call. To learn more go here
- Provide a value for the variable to in speechRecognition.java. The to number is any phone number you wish to call. This number must be verified (for trial users) and in E.164 format.
Building and Runnning the Tutorial
$ gradle build && java -Dserver.port=3000 -jar build/libs/gs-spring-boot-0.1.0.jar
If you are experiencing difficulties, contact support.