简体   繁体   English

使用python speech_recognition播放和流式传输音频

[英]Play and stream transcribe audio with python speech_recognition

I am new to Python and trying to figure out on how I can transcribe an audio speech from file in realtime with the sound playing along in the background. 我是Python的新手,试图弄清楚如何实时从文件中转录音频语音并在后台播放声音。

Update: 更新:

@petezurich Sorry for the bad question. @petezurich对不起,不好的问题。 Currently, I can hear the audio playing in the background. 目前,我可以在后台听到音频播放。 However, I am having trouble getting Sphinx to transcribe the audio. 但是,我很难让Sphinx转录音频。 Is there something wrong with the way I am passing the audio to Sphinx? 我将音频传递给Sphinx的方式有问题吗? It's constantly outputting "Sphinx error" message. 它不断输出“ Sphinx错误”消息。

I am using PocketSpinx with the Uberi/speech_recognition library. 我正在将PocketSpinxUberi / speech_recognition库一起使用。

This is what I have put together so far: 到目前为止,这是我整理的内容:

 #!/usr/bin/env python # recognitions.py : Transcribe Test from an Audio File import os import sys import time import wave import pyaudio import speech_recognition as sr import threading try: import pocketsphinx except: print("PocketSphinx is not installed.") # import audio file within script folder from os import path audio_file = path.join(os.path.abspath(os.path.dirname(sys.argv[0])), "samples/OSR_us_000_0061_8k.wav") print("Transcribing... " + audio_file) wf = wave.open(audio_file, 'rb') # set PyAudio instance pa = pyaudio.PyAudio() # set recognizer instance (unmodified) r = sr.Recognizer() stream_buffer = bytes() stream_counter = 0 audio_sampling_rate = 48000 def main_recognize(stream): global audio_sampling_rate # Create a new AudioData instance, which represents "mono" audio data audio_data = sr.AudioData(stream, audio_sampling_rate, 2) # recognize using CMU Sphinx (en-US only) try: print("Sphinx: " + r.recognize_sphinx(audio_data, language="en-US")) except sr.UnknownValueError: print("Sphinx error") except sr.RequestError as e: print("Sphinx error; {0}".format(e)) def stream_audio(data): global stream_buffer global stream_counter buffer_set_size = 200 if stream_counter < buffer_set_size: # force 'data' to BYTES to allow concat data = bytes() stream_buffer += data stream_counter += 1 else: threading.Thread(target=main_recognize, args=(stream_buffer,)).start() # reset stream_buffer = bytes() stream_counter = 0 # define callback def callback(in_data, frame_count, time_info, status): data = wf.readframes(frame_count) stream_audio(in_data) return (data, pyaudio.paContinue) # open audio stream stream = pa.open(format=pa.get_format_from_width(wf.getsampwidth()), channels=wf.getnchannels(), rate=wf.getframerate(), output=True, stream_callback=callback) # start the stream stream.start_stream() # wait for stream to finish while stream.is_active(): time.sleep(0.1) # stop stream stream.stop_stream() stream.close() wf.close() # close PyAudio pa.terminate() 

Any advice or recommendation on what might I been doing wrong? 关于我可能做错了什么的建议或建议?

Is my approach heading to the right direction? 我的方法是否朝着正确的方向前进?

Thank you in advance! 先感谢您!

https://github.com/Uberi/speech_recognition/blob/master/reference/library-reference.rst https://github.com/Uberi/speech_recognition/blob/master/reference/library-reference.rst

Uberi wrapper does not work with streams, you should try something like the original pocketsphinx API instead Uberi包装器不适用于流,您应该尝试像原始的Pocketsphinx API这样的方法

config = Decoder.default_config()
config.set_string('-hmm', path.join(MODELDIR, 'en-us/en-us'))
config.set_string('-lm', path.join(MODELDIR, 'en-us/en-us.lm.bin'))
config.set_string('-dict', path.join(MODELDIR, 'en-us/cmudict-en-us.dict'))
config.set_string('-logfn', '/dev/null')
decoder = Decoder(config)

stream = open(path.join(DATADIR, 'goforward.raw'), 'rb')
#stream = open('10001-90210-01803.wav', 'rb')

in_speech_bf = False
decoder.start_utt()
while True:
    buf = stream.read(1024)
    if buf:
        decoder.process_raw(buf, False, False)
        if decoder.get_in_speech() != in_speech_bf:
            in_speech_bf = decoder.get_in_speech()
            if not in_speech_bf:
                decoder.end_utt()
                print 'Result:', decoder.hyp().hypstr
                decoder.start_utt()
    else:
        break
decoder.end_utt()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM