Dexter Morgan
2011-03-31 19:26:13 UTC
I have two questions regarding real-time speech events:
Question 1: How can we configure the Microsoft Speech Recognition
Engine to fire SpeechDetected events earlier?
Question 2: What is the proper way to feed real-time audio and receive
events like SpeechDetected in real-time? i.e., which SetInputTo*()
method should we use?
Background / Success Story so far:
We've had much success developing a C# application using Microsoft
Speech SDK 10.2 (using the managed Speech libraries:
Microsoft.Speech.*).
Our application is real-time based, so we need to receive events as
close to real-time as possible, particularly SpeechDetected.
Regarding the SetInputTo*() methods:
* At first, we were using SetInputToAudioStream(), but we observed the
behavior that the engine would not send any events until it read the
complete stream, which is unacceptable. If there is a way to use this
API and to receive SpeechDetected events before it reads the entire
stream, then we are all ears as to how to make this work; perhaps we
just need to feed it a special kind of SpeechAudioFormatInfo.
* As a "workaround", we have been using SetInputToWaveStream() and
passing in our own Stream object; our Stream feeds the engine a WAV
header plus real-time audio data, whenever the engine calls Read().
After some special modifications, this works pretty well; the engine
fires SpeechDetected events and other events in real-time, instead of
reading the entire stream ahead of time. However, it still fires
SpeechDetected slowly, about 750ms after the beginning of the
utterance.
For example, we have an utterance that has initial silence, then 500ms
of loud in-grammar speech, then end silence. By monitoring the
engine's call to our Stream's Read() method, we can see that it reads
the full 500ms of speech and another 200ms of silence, and it still
won't fire the SpeechDetected event until it reads even more silence.
Any help in this matter would be appreciated!
Question 1: How can we configure the Microsoft Speech Recognition
Engine to fire SpeechDetected events earlier?
Question 2: What is the proper way to feed real-time audio and receive
events like SpeechDetected in real-time? i.e., which SetInputTo*()
method should we use?
Background / Success Story so far:
We've had much success developing a C# application using Microsoft
Speech SDK 10.2 (using the managed Speech libraries:
Microsoft.Speech.*).
Our application is real-time based, so we need to receive events as
close to real-time as possible, particularly SpeechDetected.
Regarding the SetInputTo*() methods:
* At first, we were using SetInputToAudioStream(), but we observed the
behavior that the engine would not send any events until it read the
complete stream, which is unacceptable. If there is a way to use this
API and to receive SpeechDetected events before it reads the entire
stream, then we are all ears as to how to make this work; perhaps we
just need to feed it a special kind of SpeechAudioFormatInfo.
* As a "workaround", we have been using SetInputToWaveStream() and
passing in our own Stream object; our Stream feeds the engine a WAV
header plus real-time audio data, whenever the engine calls Read().
After some special modifications, this works pretty well; the engine
fires SpeechDetected events and other events in real-time, instead of
reading the entire stream ahead of time. However, it still fires
SpeechDetected slowly, about 750ms after the beginning of the
utterance.
For example, we have an utterance that has initial silence, then 500ms
of loud in-grammar speech, then end silence. By monitoring the
engine's call to our Stream's Read() method, we can see that it reads
the full 500ms of speech and another 200ms of silence, and it still
won't fire the SpeechDetected event until it reads even more silence.
Any help in this matter would be appreciated!