I need help with a project which consists of 2 parts:
I have 2 questions:
Thanks in advance.
librosa
does this. The source is at
https://github.com/librosa/librosa/blob/main/librosa/effects.py#L253
The algorithm used there is summarised by the comment,
# Stretch in time, then resample
To explain this a little more, you can change the pitch by "stretching out" (or squashing) the waveform in the horizontal direction. This would, for example, make the vibrations of Middle C (262 Hz) be further apart and thus lower in frequency -- and, as a result, also lower in pitch. Stretching it out to double (and then filling in samples so that the sample rate remains unchanged) would change the pitch down an octave to C3 at 131Hz.
It looks like the hard part is resampling effectively, but a variety of algorithms are mentioned in the code.
The first part which needs scratch code is done here
for two voices you need two pitches for sure, however, you can just make unsupervised training in order to recognize the speaker so it is not very hard.
You can also use frames containing their voice if they are mixed and you want to do it without machine learning methods.
There are also lots of more robust ways of finding the speaker with ML and the most famous is MFCC which is explained here .
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.