quickie notes on Vista speech recognition through System.Speech (.NET 3.0)

Just to play around with it, I wrote a quickie app to call the .NET 3.0 interfaces for Vista’s speech recognition.  In particular, it accepts a .wav file as a param and just has the recognition engine run over that.

Some quick notes:

  • The default recognition engine isn’t necessarily the one you want.
  • On my system, I have 3 installed, one of which is the one from Office 2003 (still installed on the machine).  This appears to be the one I’m getting by default when I don’t specifically specify the recognizer to use.  Calling SpeechRecognitionEngine.InstalledRecognizers() gives, for me:
  • Installed recognizer: MS-1033-61-DESK [Microsoft English (U.S.) v6.1 Recognizer]  <– this is likely the one from Office 2003, and is the one I got by default
  • Installed recognizer: MS-1033-80-DESK [Microsoft Speech Recognizer 8.0 for Windows (English – US)]
  • Installed recognizer: MS-2057-80-DESK [Microsoft Speech Recognizer 8.0 for Windows (English – UK)]
  • When I’m constructing the SpeechRecognitionEngine, using “MS-1033-80-DESK” for the recognizerId string works fine and gets me the recognition I expect.
  • The API seems to assume you want call Recognize or RecognizeAsync multiple times.
  • While looping, there’s no way to check when to end (when to stop calling Recognize) except for checking when you get an InvalidOperationException.  Here’s some quickie PowerShell showing 2 valid calls and then the invalid one once we reach the end of the (short) wav file.  I really hate how end-of-file is an exception since it’s NOT an exceptional condition (at least in the pre-recorded audio sense – I understand why it is for a microphone).
    • C:\> [void][reflection.assembly]::loadwithpartialname(‘system.speech’)
      C:\> $rec = new-object ‘System.Speech.Recognition.SpeechRecognitionEngine’
      C:\> $rec.RecognizerInfo.Description
      Microsoft Speech Recognizer 8.0 for Windows (English – US)
      C:\> $rec.LoadGrammar((new-object ‘System.Speech.Recognition.DictationGrammar’))
      C:\> $rec.SetInputToWaveFile(‘C:\Documents and Settings\admin\Desktop\downloads\test.wav’)
      C:\> $rec.Recognize() | fl text,confidence
    • Text : This is a test 12345678
      Confidence : 0.6058533

      C:\> $rec.Recognize() | fl text,confidence

      Text : 910 this was a test
      Confidence : 0.6938771

      C:\> $rec.Recognize() | fl text,confidence
      Exception calling “Recognize” with “0” argument(s): “No audio input is supplied to this recognizer. Use the method SetInputToDefaultAudioDevice if a microphone is connected to the system, otherwise use SetInputToWaveFile, SetInputToWaveStream or SetInputToAudioStream to perform speech recognition from pre-recorded audio.”
      At line:1 char:15
      + $rec.Recognize( <<<< ) | fl text,confidence
      C:\> $error[0].exception.innerexception.gettype().name

  • Recognition on “bad” input may actually return null from Recognize (and depending on the size of the file, may block for quite awhile).  Here’s a normal (comes with Vista) wav file.  It doesn’t have any actual speech in it.
    C:\> dir $env:windir\media\start.wav
    Directory: Microsoft.PowerShell.Core\FileSystem::C:\Windows\media
    Mode  LastWriteTime     Length Name
    —-  ————-     —— —-
    -a— 8/23/2001 8:00 AM   1192 start.wav
    • Here’s recognition on that file returning null
      C:\> $rec.SetInputToWaveFile(“$env:windir\media\start.wav”)
      C:\> $output = $rec.Recognize()
      C:\> $output -eq $null

    3 thoughts on “quickie notes on Vista speech recognition through System.Speech (.NET 3.0)

    1. Hi
      I tried to convert a wav file into text but the accuracy is terrible. Is there any way to improve the accuracy.

      Thanks and Regards,
      Adarsh Nagaraj

    2. Did you ever get it to read all the input from a wav file and not have it throw exceptions all over the place/

    Comments are closed.