JLASER

This is the implementation of the LASER recognizer in Java. Questions, comments to tpavelka(at)kiv.zcu.cz

Contents

Features

Architecture

image:arch.png

LRec

The recorder component LRec can either record live or read from a file. In case of live recording the component must provide a way to stop the recording when the utterance ends. The end of the utterance is currently guessed from the length of silence on the output of the decoder component.

LPar

The parametrization (feature extraction) component LPar computes MFCC (Mel-frequency Cepstral Coefficients) which are considered a standard in today’s speech recognizers. Both types of acoustic models need features from more than one time frame (neural networks use several concatenated frames and Gaussian mixture models work with features from single time frame augmented by first and second order differences). For this reason, buffers that accumulate the features computed by LPar are put between the feature extraction component and the acoustic model.

LNNM

The LASER Neural Network Module computes a posteriori probabilities of phonetic units. Each phonetic unit corresponds to one neuron in the output layer of a multi layer perceptron. In most of our experiments, nine subsequent frames are used as an input for the neural network.

LGMM

The LASER Gaussian Mixture Module can estimate HMM emission probabilities from HTK trained models. State parameter tying is supported. The main limitation is that our implementation can currently work with diagonal covariance matrices only.

LDec

The decoder supports both relative and absolute pruning. Relative pruning (beam search) processes only states having a score higher than a certain percentage of the highest score. Absolute pruning first sorts the states according to their respective scores and then keeps only a limited number from the beginning of the list.

Download

Examples

Number Recognition

  1. Download and unpack the necessary data (includes HMM definition, alphabet, HMM graph and configuration XML file)
  2. The following class demonstrates recognizer initialization and single utterance recognition:
import jlaser.Recognizer;
import jlaser.utils.Config;
import jlaser.utils.LASERException;
public class RecognitionExample {
   public static void main(String[] args) {
      try {
         Recognizer rec = new Recognizer(new Config("data/config_numbers_32mix.xml"));
         String result = rec.runLiveOnce();
         System.out.println("Recognition result: "+result);
      } catch (LASERException e) {
         e.printStackTrace();
      }
   }
}

Building HMM Graphs

  1. Download and unpack the necessary data (includes HMM definition, alphabet, HMM graph and configuration XML file)
  2. The following class demonstrates:
import jlaser.Recognizer;
import jlaser.graphbuilder.TreeBuilder;
import jlaser.ldec.Graph;
import jlaser.transcript.TranscriptSimple;
import jlaser.utils.Config;
import jlaser.utils.LASERException;
public class TreeBuilderExample {
   public static void main(String[] args) {
      try {
         Recognizer rec = new Recognizer(new Config("data/config_numbers_32mix.xml"));
         /* initialize tree builder for three state phoneme models and isolated word recognition */ 
         TreeBuilder tb = new TreeBuilder(rec.getAlphabet(),3,false);
         /* initialize orthographic-phonetic transcriber */
         TranscriptSimple trans = new TranscriptSimple("data/ortho-phonetic.xml");
         tb.addWord("ano", trans.getTranscript("ano"));
         tb.addWord("ne", trans.getTranscript("ne"));
         tb.addWord("nevím", trans.getTranscript("nevím"));
         /* get the HMM graph (warning: resets the tree builder) */
         Graph graph = tb.getGraph();
         /* optional: save the graph in GraphViz format */
         graph.saveGraphViz("graph.dot", tb.getAlphabet());
         /* replace the current recognition graph used by the recognize by a new one */
         rec.setGraph(graph);
         /* recognize one utterance */
         String result = rec.runLiveOnce();
         System.out.println("Recognition result: "+result);
      } catch (LASERException e) {
         e.printStackTrace();
      }
   }
}

Applications

MouseMove

A simple application similar to this which allows the user to control the mouse by voice.

image:MouseMove.png

By pronouncing the four czech vowels "a","i", "o" and "u" the user can move the mouse pointer (see above). The left mouse button can be pressed by saying "k" and released by "č". Check the program's sources to see how the main recognition loop can be modified to fit the purpose in this application.

Download: MouseMove licence