简体   繁体   English

处理来自 s3 存储桶的音频文件以在 Python 中将语音转换为文本

[英]Processing audio files from s3 bucket for speech to text conversion in Python

I have files stored in an s3 bucket that are uploaded from Genysis Purecloud.我将文件存储在 s3 存储桶中,这些文件是从 Genysis Purecloud 上传的。 They are customer calls saved as.opus files but can be converted to.wav files when using a download function in python.它们是保存为.opus 文件的客户调用,但在使用python 中的下载功能时可以转换为.wav 文件。 I am having problems processing these files using the Python libraries boto3 and speech_recognition .我在使用 Python 库boto3speech_recognition处理这些文件时遇到问题。

I need to be able to call the audio file (as variable f below) in the following script to firstly transform the audio into text and then run it through an NLP algorithm:我需要能够在以下脚本中调用音频文件(如下面的变量 f),首先将音频转换为文本,然后通过 NLP 算法运行它:

audio = f
r = sr.Recognizer()

with sr.AudioFile(audio) as source:
    # listen for the data (load audio to memory)
    audio_data = r.record(source)
    #recognize (convert from speech to text)
    text = r.recognize_google(audio_data)
    print(text)

I can use the get() function in boto3 to retrieve the object/file from s3:我可以使用 boto3 中的 get() 函数从 s3 检索对象/文件:

s3_object = s3_resource.Bucket(bucket).Object('9d3be36b-6777-4b2a-8912-4dbfaaeea0ab/year=2022/month=11/day=10/hour=0/conversation_id=00d32818-16f5-4a6f-a97c-3db50ce9254f/36f1c0ce-d168-4df3-bae8-219bd5b59802.opus').get()

The problem is that the speech recognition library does not accept this as audio data as it is not reading the object as a file such as.wav.mp3 etc.问题是语音识别库不接受它作为音频数据,因为它没有将对象作为文件读取,例如 .wav.mp3 等。

Any ideas on how I can pull the files from s3 so that they are in a format where I can process them?关于如何从 s3 中提取文件以便它们采用我可以处理它们的格式的任何想法? Thanks谢谢

I have tried using the get() function in boto3.我试过在 boto3 中使用 get() 函数。 I have also tried downloading the object onto my local drive.我也试过将对象下载到我的本地驱动器上。 I managed to save this as a.wav file and when trying the process in the speech recognition library, get the follwing error:我设法将其保存为 a.wav 文件,并在语音识别库中尝试该过程时,出现以下错误:

'charmap' codec can't decode byte 0x90 in position 485: character maps to <undefined>

'charmap' codec can't decode byte 0x90 in position 485: character maps to 'charmap' 编解码器无法解码位置 485 中的字节 0x90:字符映射到

You opened the audio file in text mode.您以文本模式打开了音频文件。 You need to open it in binary mode.您需要以二进制模式打开它。

f = open('myfile.wav', 'rb')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM