[英]Azure Speech SDK Speech-to-Text to Stream Audio Segments
我一直在使用 Azure 的 Speech-To-Text 服务,使用从内存中的stream方法识别。 基本上我打算做的是 stream 仅将音频的某些部分提供给服务,但我不完全确定如何做到这一点。 假设我有一个长度为 5 分钟的视频,而我的目标是仅在前 30 秒内播放 stream,甚至只是从音频文件中的 1 分钟标记到 3 分钟标记,我需要在以下代码中启用或更改什么这样做?
我尝试使用 CreatePullStream() 而不是 CreatePushStream() 在几秒钟内提供标记,但它没有产生我上面描述的目标。 如果有人知道,请告诉我如何实现这一目标,非常感谢!
using System;
using System.IO;
using System.Threading.Tasks;
using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;
class Program
{
async static Task FromStream(SpeechConfig speechConfig)
{
var reader = new BinaryReader(File.OpenRead("audioFile.wav"));
using var audioInputStream = AudioInputStream.CreatePushStream();
using var audioConfig = AudioConfig.FromStreamInput(audioInputStream);
using var recognizer = new SpeechRecognizer(speechConfig, audioConfig);
byte[] readBytes;
do
{
readBytes = reader.ReadBytes(1024);
audioInputStream.Write(readBytes, readBytes.Length);
} while (readBytes.Length > 0);
var result = await recognizer.RecognizeOnceAsync();
Console.WriteLine($"RECOGNIZED: Text={result.Text}");
}
async static Task Main(string[] args)
{
var speechConfig = SpeechConfig.FromSubscription("<paste-your-subscription-key>", "<paste-your-region>");
await FromStream(speechConfig);
}
}
您可以只使用NAudio.Wave
来剪切您的 source.wav 文件。 例如,如果您想识别 .wav 文件的 1 分钟 - 3 分钟内容,请尝试以下代码:
using System;
using System.IO;
using System.Threading.Tasks;
using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;
using NAudio.Wave;
class Program
{
async static Task FromStream(SpeechConfig speechConfig)
{
var inputAudioPath = @"<path>";
var outputAudioPath = @"<path>";
var startAt = new TimeSpan(0, 1, 0); //start at 1 min
var duration = new TimeSpan(0,2,0); //cut 1-3 min audio, it lasts 2 mins
CutAudio(inputAudioPath, outputAudioPath, startAt, duration);
var reader = new BinaryReader(File.OpenRead(outputAudioPath));
var audioInputStream = AudioInputStream.CreatePushStream();
var audioConfig = AudioConfig.FromStreamInput(audioInputStream);
var recognizer = new SpeechRecognizer(speechConfig, audioConfig);
byte[] readBytes;
do
{
readBytes = reader.ReadBytes(1024);
audioInputStream.Write(readBytes, readBytes.Length);
} while (readBytes.Length > 0);
var result = await recognizer.RecognizeOnceAsync();
Console.WriteLine($"RECOGNIZED: Text={result.Text}");
}
public static void CutAudio(String inputPath,String destPath,TimeSpan startAt, TimeSpan duration) {
using (var reader = new AudioFileReader(inputPath))
{
reader.CurrentTime = startAt; // jump forward to the position we want to start from
WaveFileWriter.CreateWaveFile16(destPath, reader.Take(duration));
}
}
async static Task Main(string[] args)
{
var speechConfig = SpeechConfig.FromSubscription("<key>", "<region>");
await FromStream(speechConfig);
}
}
结果:
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.