I first wish to declare that this tutorial is entirely what is there on Vik Paruchuri’s blog, and I’ve only described step by step what I did to make his idea work, both on my Linux laptop as well as a Raspberry Pi 2. All the files and instructions are from his GitHub repository and I’ve just added what extra I had to do to get his code working. A big thanks to him!
Voice recognition sounds amazing (Pun intended!). A lot of research goes into the same, and the results today are very satisfying. Take for example the skills of Siri, Cortana and OK Google, they work perfectly irrespective of accent and it all seems magical. Here’s a python library that let’s you do this offline, albeit a bit slowly.
The scribe python code by Vik uses PyAudio and the base recognition engine is based on Sphinx, a CMU project. Sphinx lets you train your own voice libraries too, so if you want Doge to be pronounced as Dog-E, or Doje, or simply Doge, it’s your wish! 🙂
For the recognizer to work, first download the Pocketsphinx (Link) and Sphinxbase (Link) libraries from Sourceforge. Download the tar.gz files and extract them in a desired location. First go to the subfolder python and delete pocketsphinx.c and sphinxbase.c . Next ensure you have Cython, Libasound and PyAudio installed, and if not type
apt-get install cython libasound2-dev python-pyaudio
Now go to the folders, go root (sudo su) and type (in order)
./configure make clean all make install cd python python setup.py install
There’s high probability that the configure command would have some or the other dependency missing, just read the missing dependency and install it via
apt-get install #dependency_name#
The installation for both should pretty much work after that. When all 5 steps are over, clone the Vik’s Github repository and then you need to download a language pack, both acoustic module and a language module. For those speaking English, I have compiled the language pack in a single Google Drive folder. For other languages, you need to download files from this link. As mentioned on the readme markdown of the repo, the file structure should look like:
scribe ├── dict │ └── cmu07a.dic ├── hmm │ ├── feat.params │ ├── feature_transform │ ├── mdef │ ├── means │ ├── mixture_weights │ ├── noisedict │ ├── README │ ├── transition_matrices │ └── variances ├── lm │ └── cmusphinx-5.0-en-us.lm.dmp
After you’ve configured everything, it’s time to test your code. Plug in your microphone and speak something. Default record period is 5 seconds (I’ve changed it to 2 in my folder). After the processing is complete (takes 4-5 seconds), it displays what you said with good accuracy.
You can modify the dic file or create a new one, by using a text editor and changing the corresponding parameter in the recognizer.py file.
I hope this works for you and you’re able to run voice recognition on your PC peacefully. I’ve tested this on PC and Raspberry Pi 2 and it just works! If it does, don’t forget to thank Vik and CMU, and do leave feedback!