Speech Recognizer

Most speech recognizers work on the basis of two important parts: the acoustic model and the language model. The acoustic model essentially entails how individual characters are pronounced, also known as phonemes. The language model describes how the grammar looks like for example it would give the probability of word(X) being followed by word(Y). So we have to make sure we have both aspects covered for recognition of ROILA.

The speech recognizer that we are using is Sphinx-4 an open source recognition platform from CMU. We have reviewed a number of open source and customizable speech recognition tools and Sphinx was determined to be the most accurate and user friendly.

Installation of Sphinx-4 and using Eclipse to program with Sphinx-4

Extensive installation instructions are available for Sphinx-4. While downloading Sphinx-4 from sourceforge, you may be tempted to install the latest beta version 4. However doing so and later on running Sphinx from within Java gives a Class not found exception for WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.jar – which is the acoustic model class file. To have everything working smoothly please install either beta version 3 or 2.

Moreover you might want to use the Eclipse IDE to program in Sphinx-4, a very efficient tutorial is provided that explains the integration of Sphinx-4 and Eclipse and from the same resource there are also some notes on the configuration details of Sphinx. Please remember to follow the instructions of the online tutorial which indicates which .jar files you must add to Eclipse for Sphinx-4 to work. You must have atleast the following jar files included in your project properties:

  • js.jar
  • jsapi.jar
  • sphinx4.jar
  • WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.jar

These jar files are found in the /lib folder of where you have Sphinx installed.

However before any recognition can take place within Eclipse, Sphinx-4 needs to be configured and modified so that it can recognize ROILA, lets see how this can be done.

Identifying context and a list of plausible sentences

Initially the first step that needs to be accomplished is to identify the context of use. What do you want to talk about in ROILA? Lets take our example of instructing a LEGO Mindstorms NXT robot to navigate through its environment. A simple scenario would mean commands such as:

Walk Forward | fosit koloke | fosit koloke
Walk slowly | fosit kipupi | fosit kipupi
Walk quickly | fosit jimeja | fosit jimeja
Run
| bobuja
| bobuja
Now you stop
| bama buse fosit
| bama buse fosit
Don’t walk
| buse fosit
| buse fosit
Walk Backwards
| fosit nole
| fosit nole
Walk Left
| fosit webufo
| fosit webufo
Walk Right
| fosit besati
| fosit besati

In the above given commands we also provide sample audio clips as a guide of how to pronounce them. These audio clips were successfully reognized when passed in the Sphinx-4 recognizer.

Creating the Language Model

The next step is to list the plausible list of ROILA sentences in a text file. Once you have written out the ROILA sentences you can already construct a language model by going to the Language Modelling Tool provided by Sphinx. Upload your sample sentences text file in their tool and it will generate a language model when you press the compile knowledge base button. At this point download the language model file only, it will be a file with a .lm extension (a sample .lm file of the afore mentioned scenario).

Creating the Pronounciation Dictionary

Do not download or use the pronounciation dictionary generated by the Language Modelling Tool, we will go ahead and specify our own pronounciation dictionary which basically would suggest how we want the words to be pronounced. Here is our sample. Sphinx requires every word of the dictionary to be broken down into ARPABET symbols. Most linguists are familiar with the International Phonetic Aplhabet (IPA) standard, ARPABET is basically an ASCII representation of the IPA and a conversion is quite helpful.

For example the word FOSIT in ARPABET would be written as F AA S IH T

Note that by specifying our own dictionary we are provided with the freedom of defining our own pronounciations.

Setting up the XML Configuration file

So now you have two files that are ready and whose paths can be inserted in a configuration XML file that Sphinx uses while recognition is going on. For the afore mentioned scenario you are welcome to use our configuration file. You will have to change the following two paths in the XML file:

The Dictionary configuration
<component name=”dictionary” type=”edu.cmu.sphinx.linguist.dictionary.FastDictionary”>
<property name=”dictionaryPath” value=”file:YOUR .6D FILE PATH”/>

In The Language Model configuration
<component name=”trigramModel” type=”edu.cmu.sphinx.linguist.language.ngram.SimpleNGramModel”>
<property name=”location” value=”file:YOUR .LM FILE PATH”/>

Other than that the file can remain as it is and is hence a standard XML file for ROILA recognition. These three files will now be the crux of the recognition. Note that still no recognition has taken place and to accomplish that we can move towards writing Java code.

Executing speech recognition within Java

Create a normal Java project in Eclipse and use the following sample java code as the heart of your source file. Remember to follow the settings mentioned earlier on this page in reference to adding the relevant .jar files. You should be able to get some recognition going with this, note that the program runs forever with the microphone constantly on listening for input. The result of the recognition is returned as a string which you could use as you desire.

String resultText = result.getBestResultNoFiller();

In addition, there is no parsing being done by Sphinx or by our sample program.You could do some parsing on the result returned by Sphinx and determine if it matches to any of the plausible sentences.

To communicate with a NXT Mindstorms robot we recommend using bluetooth where the recognition result is sent over the bluetooth as bytes. For more details about communicating with your NXT robot we have provided a brief tutorial.

  1. J’s avatar

    Pronunciation for “fosit koloke” is incorrect

  2. Pradnya’s avatar

    Yeah it proved to be beneficial

  3. Juan Antonio’s avatar

    I tried but I received the following exception:

    class not found !java.lang.ClassNotFoundException: edu.cmu.sphinx.model.acoustic.WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.Model
    Exception in thread “main” Property exception component:’flatLinguist’ property:’acousticModel’ – component ‘wsj’ is missing

    Do you have sphinx3.jar?

    I don’t find a release where it included that jar

    Cheers

  4. mubin’s avatar

    Did you install the correct versions of Sphinx 4? I always faced problems if I installed something other than beta version 3 or 4 of Sphinx-4. Or there is something wrong in your XML configuration file. Are you using the sample XML file we provide? I do not have experience with Sphinx-3.

  5. pooja’s avatar

    pls give me direct answers to my question…….

  6. mubin’s avatar

    how can we help you (pooja)

  7. Ashish’s avatar

    i want the code for converting the voice 2 text conversion
    or idea for converting it… in any programming Lang.

  8. madhura’s avatar

    i tried the above instructions and it worked all fine..but i want to record an audio file and then pass this audio file to the above program to recognize speech..how can i do that..? please help me out…

  9. ima’s avatar

    I tried this code with netbeans but a lot of errors appear any one can help me??

  10. Spasa Biops’s avatar

    I thank you for inventing a robot language that uses the simplest phonemes….I was in the process of doing my own version…..Aero Space communications via LaserCom was the goal. A universal language for Interplanetary/Intergalactic communications in the far future, that was as simple as possible. The B.I.O.P.S. stands for Biomechanically Integrated Organically Programmable Species. Would like to explain further, but the concepts involved are TS/SI for military purposes only. The ROILA use would be for A.R.M.A.D.I.L.L.O….Arrayed Robotic & Manned Aerospace Designated Intelligent Launch & Landing Organism…about to major research into your ideas and if I find something helpful, I will be glad to share.

  11. Hồ Thúc Đông’s avatar

    I’m doing my project about voice browser and i decided to use Sphinx4, but i have to use VietNamese leaguage, i don’t know how to use it. Can you give me some advice?!!! thank u!!!

Reply

Your email address will not be published. Required fields are marked *