简体繁体 English

用python机器学习识别声音（单词）

[英]Recognition of a sound (a word) with machine learning in python

原文 2016-09-14 08:56:07 3 1 python/ audio/ machine-learning

I'm preparing an experiment, and I want to write a program using python to recognize certain word spoken by the participants. 我正在准备一个实验，我想使用python编写程序以识别参与者说出的某些单词。

I searched a lot about speech recognition in python but the results are complicated.(eg CMUSphinx). 我在python中搜索了很多有关语音识别的内容，但结果却很复杂（例如CMUSphinx）。

What I want to achieve is a program, that receive a sound file (contains only one word, not English), and I tell the program what the sound means and what output I want to see. 我要实现的是一个程序，该程序接收一个声音文件（仅包含一个单词，不包含英语），然后告诉程序声音的含义和想要看到的输出。

I have seen the sklearn example about recognizing hand-written digits. 我看过有关识别手写数字的sklearn 示例。 I want to know if I can do something like the example: 我想知道是否可以执行以下示例：

training the program to return certain output (eg numbers) according to sound files from different people saying same word; 训练程序根据来自不同人说相同单词的声音文件返回某些输出（例如数字）；
when take in new sound files from other person saying same word, return same values. 当从其他人说相同的单词获取新的声音文件时，返回相同的值。

Can I do this with python and sklearn? 我可以使用python和sklearn吗？ If so, where should I start? 如果是这样，我应该从哪里开始？

Thank you! 谢谢！

1 个解决方案

I've written such program in text recognition. 我已经在文本识别中编写了这样的程序。 I can tell you if you chose to "teach" your program manually you will have a lot of work think about the variation in voice due to accents etc. 我可以告诉您，如果您选择手动“教”您的程序，您将有很多工作要考虑由于重音等引起的语音变化。

You could start looking for a sound analyzer here (Musical Analysis). 您可以在这里开始寻找声音分析仪（音乐分析）。 try to identify the waves of a simple word like "yes" and write an alghorithm that percentages the variation of the soundfile. 尝试识别一个简单单词（如“是”）的波动，然后编写一个算法，将声音文件的变化百分比化。 this way you can put a margin in to safe yourself from false-positives / vice-versa. 这样，您就可以保证自己免受假阳性的影响，反之亦然。

Also you might need to remove background noise from the soundfile first as they may interfer with your identification patterns. 另外，您可能需要先从声音文件中删除背景噪音，因为它们可能会干扰您的识别模式。