使用 python 从零开始实时音高转换

Question

I need help with a project which consists of 2 parts:我需要一个由两部分组成的项目的帮助：

real time pitch shifter in python (from scratch). python 中的实时音高移位器（从头开始）。
switch the pitches of 2 voices from 2 different speakers.切换来自 2 个不同扬声器的 2 个声音的音高。

I have 2 questions:我有两个问题：

I couldn't find the proper math behind pitch shifting to implement it from scratch so a simple explanation would be appreciated.我找不到音高转换背后的正确数学来从头开始实现它，因此我们将不胜感激。
Do I need to extract pitches from 2 voices to switch them or there's a simpler solution?我需要从 2 个声音中提取音高来切换它们还是有更简单的解决方案？ If not an explanation on how to properly extract pitch from a sound and switching it is appreciated.如果不是关于如何从声音中正确提取音高并进行切换的解释，我们将不胜感激。

Thanks in advance.提前致谢。

Answer 1

librosa does this. librosa这样做的。 The source is at来源在

https://github.com/librosa/librosa/blob/main/librosa/effects.py#L253 https://github.com/librosa/librosa/blob/main/librosa/effects.py#L253

The algorithm used there is summarised by the comment,评论中总结了那里使用的算法，

# Stretch in time, then resample

To explain this a little more, you can change the pitch by "stretching out" (or squashing) the waveform in the horizontal direction.为了进一步解释这一点，您可以通过在水平方向上“拉伸”（或挤压）波形来改变音高。 This would, for example, make the vibrations of Middle C (262 Hz) be further apart and thus lower in frequency -- and, as a result, also lower in pitch.例如，这将使中间 C (262 Hz) 的振动相距更远，因此频率更低——因此，音高也更低。 Stretching it out to double (and then filling in samples so that the sample rate remains unchanged) would change the pitch down an octave to C3 at 131Hz.将其拉伸到两倍（然后填充样本以使采样率保持不变）会将音高降低一个八度音阶到 131Hz 的 C3。

It looks like the hard part is resampling effectively, but a variety of algorithms are mentioned in the code.看起来困难的部分是有效地重采样，但代码中提到了各种算法。

Answer 2

The first part which needs scratch code is done here需要临时代码的第一部分在这里完成

for two voices you need two pitches for sure, however, you can just make unsupervised training in order to recognize the speaker so it is not very hard.对于两个声音，您肯定需要两个音高，但是，您可以进行无监督训练以识别说话者，所以这并不难。

You can also use frames containing their voice if they are mixed and you want to do it without machine learning methods.如果它们是混合的并且你想在没有机器学习方法的情况下这样做，你也可以使用包含他们的声音的帧。

There are also lots of more robust ways of finding the speaker with ML and the most famous is MFCC which is explained here .还有很多更强大的方法可以找到具有 ML 的扬声器，其中最著名的是 MFCC，在此处进行了说明。

使用 python 从零开始实时音高转换

问题描述

2 个解决方案

解决方案1
0 2020-07-15 10:34:51

解决方案2
0 2021-03-05 20:41:32

使用 python 从零开始实时音高转换

问题描述

2 个解决方案

解决方案1 0 2020-07-15 10:34:51

解决方案2 0 2021-03-05 20:41:32

解决方案1
0 2020-07-15 10:34:51

解决方案2
0 2021-03-05 20:41:32