The Blog

Speech recognition for .Net ?
November 22nd, 2011Artificial IntelligenceJames 1 Comments

- there is no magical System Speech Programming Guide for .NET Framework 4.0. Not according to Google. Not according to Bing. Not According to Bing for MSDN.

I am currently working on a little prototype of game where you command and steer a tank over LAN via voice input.

There are a lot of API’s out there that can handle the voice input but the documentation/compatability problems seem exponential.

Hopefully this article can help someoneĀ  figure out how to achieve this without going through an endless stream of COM code!


On Windows 7 (and most surely Vista) under the hood of the Control Panel you can tinker with “Speech Recognition” (Control Panel\Ease of Access\Speech Recognition).

The library this article is based on uses this technology at its core level, it is worth noting that the “Train your computer to better understand you” does take around 5 minuets but really made my life easier so do it right now!

Free library!

Under the “System.Speech.Recognition” name-space you will find all the tools required to set-up, capture and detect phrases of all kinds.

Just import it into a C# project (this article is focused on a Console application) and well be on our way.

Stages of capturing a voice

Although it is fairly easy to comprehend I think it is nice to have a list on the steps we will use to recognise voice:

  1. I am using a computer that is from England, I am guessing Microsoft’s speech input capturer deals with American… This was a problem for me as it would throw an exception telling me I am in the wrong culture!
    To solve this the first step is to force the current thread you are working on into the correct culture unless you are supported already.
  2. This article is all about responding to “valid” input. To clarify to our selves what is valid we simply add a bunch of “phrases” the engine needs to differentiate. I believe the more phrases you add the slower it takes to process…
  3. As the technology is based on the Windows Speech Recognition, we need to get the “profile” (that thing you have supposedly trained) from the operating system, you know, the one that matches your “culture”.
  4. Build up the grammar tree at compile time to use less strain on the real time rendering.
  5. Ensure that the microphone you have as your “primary” is the one you will capture.
  6. Wire up the events you want to listen for.
  7. Kick off the speech recording asynchronously !

Some code to illustrate the stages


using System;
using System.Globalization;
using System.Linq;
using System.Speech.Recognition;
using System.Threading;

namespace ConsoleApplication3
public class SpeechInput

private SpeechRecognitionEngine _recognizer;
private readonly Choices _choices = new Choices();

public event EventHandler<SpeechRecognizedEventArgs> OnRecognizedEvent = delegate { };

public SpeechInput(CultureInfo culture)
Thread.CurrentThread.CurrentCulture = culture;
Thread.CurrentThread.CurrentUICulture = culture;

public void AddPhrase(string phrase)

public void Initalize()
RecognizerInfo selectedRecognizer = (from e in SpeechRecognitionEngine.InstalledRecognizers()
where e.Culture.Equals(Thread.CurrentThread.CurrentCulture)
select e)

GrammarBuilder gb = new GrammarBuilder(_choices);
Grammar g = new Grammar(gb);

_recognizer = new SpeechRecognitionEngine(selectedRecognizer);
_recognizer.SpeechRecognized += SpeechRecognized;


void SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
OnRecognizedEvent(sender, e);


using System;
using System.Globalization;
using System.Speech.Recognition;
using System.Threading.Tasks;

namespace ConsoleApplication3
class Program
static void Main()
SpeechInput speechInput = new SpeechInput(new CultureInfo("en-GB"));
speechInput.OnRecognizedEvent += speechInput_OnRecognizedEvent;

Task.Factory.StartNew(() =>
while (true)
if (!String.IsNullOrEmpty(text))
text = "";




private static string text = "";
static void speechInput_OnRecognizedEvent(object sender, SpeechRecognizedEventArgs e)
if (e.Result.Confidence > 0.92)
lock (text)
text = e.Result.Text;


  • The “confidence” constraint I set in the event handler was used to “tame” the capturing. Without it I could just murmur a sort-a sounding noise and it will probably give me a result. “gooooooeye” would result in “goodbye”.
    Ranging from 0-1 I found 0.925 the best for me but for yourself 0.92 may work better as its less specific. (I like speaking overly clear xD).
  • I have only exposed one event handler “OnRecognizedEvent”, there are several other events that I have not show in this article for you to explore!

I hope that this article is good enough to satisfy the lack of decent quality “Speech recognition” tutorials, there is so much more that can be done with this technology and I am really looking forward to seeing how it developes.

- Thanks for reading.

One comment on “Speech recognition for .Net ?

  1. Pingback: Voice parsing | Box Hacker

Leave a Reply

Your email address will not be published. Required fields are marked *


You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>