Speech Recognizer

Most speech recognizers work on the basis of two important parts: the acoustic model and the language model. The acoustic model essentially entails how individual characters are pronounced, also known as phonemes. The language model describes how the grammar looks like for example it would give the probability of word(X) being followed by word(Y). So we have to make sure we have both aspects covered for recognition of ROILA.

The speech recognizer that we are using is Sphinx-4 an open source recognition platform from CMU. We have reviewed a number of open source and customizable speech recognition tools and Sphinx was determined to be the most accurate and user friendly.

Installation of Sphinx-4 and using Eclipse to program with Sphinx-4

Extensive installation instructions are available for Sphinx-4. While downloading Sphinx-4 from sourceforge, you may be tempted to install the latest beta version 4. However doing so and later on running Sphinx from within Java gives a Class not found exception for WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.jar – which is the acoustic model class file. To have everything working smoothly please install either beta version 3 or 2.

Moreover you might want to use the Eclipse IDE to program in Sphinx-4, a very efficient tutorial is provided that explains the integration of Sphinx-4 and Eclipse and from the same resource there are also some notes on the configuration details of Sphinx. Please remember to follow the instructions of the online tutorial which indicates which .jar files you must add to Eclipse for Sphinx-4 to work. You must have atleast the following jar files included in your project properties:

js.jar
jsapi.jar
sphinx4.jar
WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.jar

These jar files are found in the /lib folder of where you have Sphinx installed.

However before any recognition can take place within Eclipse, Sphinx-4 needs to be configured and modified so that it can recognize ROILA, lets see how this can be done.

Identifying context and a list of plausible sentences

Initially the first step that needs to be accomplished is to identify the context of use. What do you want to talk about in ROILA? Lets take our example of instructing a LEGO Mindstorms NXT robot to navigate through its environment. A simple scenario would mean commands such as:

Walk Forward | fosit koloke | fosit koloke

Walk slowly | fosit kipupi | fosit kipupi

In the above given commands we also provide sample audio clips as a guide of how to pronounce them. These audio clips were successfully reognized when passed in the Sphinx-4 recognizer.

Creating the Language Model

The next step is to list the plausible list of ROILA sentences in a text file. Once you have written out the ROILA sentences you can already construct a language model by going to the Language Modelling Tool provided by Sphinx. Upload your sample sentences text file in their tool and it will generate a language model when you press the compile knowledge base button. At this point download the language model file only, it will be a file with a .lm extension (a sample .lm file of the afore mentioned scenario).

Creating the Pronounciation Dictionary

Do not download or use the pronounciation dictionary generated by the Language Modelling Tool, we will go ahead and specify our own pronounciation dictionary which basically would suggest how we want the words to be pronounced. Here is our sample. Sphinx requires every word of the dictionary to be broken down into ARPABET symbols. Most linguists are familiar with the International Phonetic Aplhabet (IPA) standard, ARPABET is basically an ASCII representation of the IPA and a conversion is quite helpful.

For example the word FOSIT in ARPABET would be written as F AA S IH T

Note that by specifying our own dictionary we are provided with the freedom of defining our own pronounciations.

Setting up the XML Configuration file

So now you have two files that are ready and whose paths can be inserted in a configuration XML file that Sphinx uses while recognition is going on. For the afore mentioned scenario you are welcome to use our configuration file. You will have to change the following two paths in the XML file:

The Dictionary configuration
<component name=”dictionary” type=”edu.cmu.sphinx.linguist.dictionary.FastDictionary”>
<property name=”dictionaryPath” value=”file:YOUR .6D FILE PATH”/>

In The Language Model configuration
<component name=”trigramModel” type=”edu.cmu.sphinx.linguist.language.ngram.SimpleNGramModel”>
<property name=”location” value=”file:YOUR .LM FILE PATH”/>

Other than that the file can remain as it is and is hence a standard XML file for ROILA recognition. These three files will now be the crux of the recognition. Note that still no recognition has taken place and to accomplish that we can move towards writing Java code.

Executing speech recognition within Java

Create a normal Java project in Eclipse and use the following sample java code as the heart of your source file. Remember to follow the settings mentioned earlier on this page in reference to adding the relevant .jar files. You should be able to get some recognition going with this, note that the program runs forever with the microphone constantly on listening for input. The result of the recognition is returned as a string which you could use as you desire.

String resultText = result.getBestResultNoFiller();

In addition, there is no parsing being done by Sphinx or by our sample program.You could do some parsing on the result returned by Sphinx and determine if it matches to any of the plausible sentences.

To communicate with a NXT Mindstorms robot we recommend using bluetooth where the recognition result is sent over the bluetooth as bytes. For more details about communicating with your NXT robot we have provided a brief tutorial.

14 comments

Comments feed for this article

Trackback link: https://roila.org/installation/sphinx-4-speech-recognizer/trackback/

Pingback from Try ROILA Speech Recognition | ROILA on April 29, 2010 at 11:24
J on July 17, 2010 at 01:00

Pronunciation for “fosit koloke” is incorrect
Pingback from First Attempts with Festival for Text to Speech for ROILA | ROILA on July 27, 2010 at 16:13
Pingback from PROMAGO.de Gadgets Blog » Blog Archive » Eine gemeinsame Sprache für Menschen und Roboter on July 28, 2010 at 17:32
Pradnya on January 13, 2011 at 09:00

Yeah it proved to be beneficial
Juan Antonio on March 13, 2011 at 21:37

I tried but I received the following exception:

class not found !java.lang.ClassNotFoundException: edu.cmu.sphinx.model.acoustic.WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.Model
Exception in thread “main” Property exception component:’flatLinguist’ property:’acousticModel’ – component ‘wsj’ is missing

Do you have sphinx3.jar?

I don’t find a release where it included that jar

Cheers
mubin on April 20, 2011 at 09:22

Did you install the correct versions of Sphinx 4? I always faced problems if I installed something other than beta version 3 or 4 of Sphinx-4. Or there is something wrong in your XML configuration file. Are you using the sample XML file we provide? I do not have experience with Sphinx-3.
pooja on August 31, 2011 at 12:38

pls give me direct answers to my question…….
mubin on August 31, 2011 at 12:48

how can we help you (pooja)
Ashish on January 23, 2012 at 22:21

i want the code for converting the voice 2 text conversion
or idea for converting it… in any programming Lang.
madhura on January 25, 2012 at 09:07

i tried the above instructions and it worked all fine..but i want to record an audio file and then pass this audio file to the above program to recognize speech..how can i do that..? please help me out…
ima on February 4, 2012 at 19:39

I tried this code with netbeans but a lot of errors appear any one can help me??
Spasa Biops on February 20, 2014 at 23:52

I thank you for inventing a robot language that uses the simplest phonemes….I was in the process of doing my own version…..Aero Space communications via LaserCom was the goal. A universal language for Interplanetary/Intergalactic communications in the far future, that was as simple as possible. The B.I.O.P.S. stands for Biomechanically Integrated Organically Programmable Species. Would like to explain further, but the concepts involved are TS/SI for military purposes only. The ROILA use would be for A.R.M.A.D.I.L.L.O….Arrayed Robotic & Manned Aerospace Designated Intelligent Launch & Landing Organism…about to major research into your ideas and if I find something helpful, I will be glad to share.
Hồ Thúc Đông on March 4, 2014 at 04:03

I’m doing my project about voice browser and i decided to use Sphinx4, but i have to use VietNamese leaguage, i don’t know how to use it. Can you give me some advice?!!! thank u!!!

Speech Recognizer

Installation of Sphinx-4 and using Eclipse to program with Sphinx-4

Identifying context and a list of plausible sentences

Creating the Language Model

Creating the Pronounciation Dictionary

Setting up the XML Configuration file

Executing speech recognition within Java

14 comments

Reply Cancel reply

Topics

Search

Meta

In collaboration with