[英]Real time pitch shifting from scratch using python
I need help with a project which consists of 2 parts:我需要一个由两部分组成的项目的帮助:
I have 2 questions:我有两个问题:
Thanks in advance.提前致谢。
librosa
does this. librosa
这样做的。 The source is at来源在
https://github.com/librosa/librosa/blob/main/librosa/effects.py#L253 https://github.com/librosa/librosa/blob/main/librosa/effects.py#L253
The algorithm used there is summarised by the comment,评论中总结了那里使用的算法,
# Stretch in time, then resample
To explain this a little more, you can change the pitch by "stretching out" (or squashing) the waveform in the horizontal direction.为了进一步解释这一点,您可以通过在水平方向上“拉伸”(或挤压)波形来改变音高。 This would, for example, make the vibrations of Middle C (262 Hz) be further apart and thus lower in frequency -- and, as a result, also lower in pitch.例如,这将使中间 C (262 Hz) 的振动相距更远,因此频率更低——因此,音高也更低。 Stretching it out to double (and then filling in samples so that the sample rate remains unchanged) would change the pitch down an octave to C3 at 131Hz.将其拉伸到两倍(然后填充样本以使采样率保持不变)会将音高降低一个八度音阶到 131Hz 的 C3。
It looks like the hard part is resampling effectively, but a variety of algorithms are mentioned in the code.看起来困难的部分是有效地重采样,但代码中提到了各种算法。
The first part which needs scratch code is done here需要临时代码的第一部分在这里完成
for two voices you need two pitches for sure, however, you can just make unsupervised training in order to recognize the speaker so it is not very hard.对于两个声音,您肯定需要两个音高,但是,您可以进行无监督训练以识别说话者,所以这并不难。
You can also use frames containing their voice if they are mixed and you want to do it without machine learning methods.如果它们是混合的并且你想在没有机器学习方法的情况下这样做,你也可以使用包含他们的声音的帧。
There are also lots of more robust ways of finding the speaker with ML and the most famous is MFCC which is explained here .还有很多更强大的方法可以找到具有 ML 的扬声器,其中最著名的是 MFCC, 在此处进行了说明。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.