简体   繁体   中英

How do I find a scale between two different audio samples?

I'm planning to make an universal application that analyses audio samples. When I say 'universal' I mean that any technology (Javascript, C, Java, etc) can use it. Basically I made an application on iOS, using Apple's AVFoundation, that receives on real time the microphone samples at a lenght of 512 (bufferSize = 512). At Python I made the same thing, using PyAudio, but unfortunately I received very different values...

Look the samples:

Samples of bufferSize = 512 on iOS:

[0.0166742969, 0.0181432627, 0.0184620395, 0.0182254426, 0.0181945376, 0.0185530782, 0.0192517322, 0.0199078992, 0.0204724055, 0.0212812237, 0.022370765, 0.0230008475, 0.0225516111, 0.0213304944, 0.0200473778, 0.019841563, 0.0206818394, 0.0211550407, 0.0207783803, 0.020227218 ....

Samples of bufferSize = 512 on Python:

[ -52.  -32.  -11.   10.   24.   31.   37.   38.   33.   25.   10.   -4.
  -18.  -26.  -29.  -39. ....

For more:

https://pastebin.com/jrM2VWXR

The Python code:

https://gist.github.com/denisb411/7c6f601175e8bb9f735d8aa43a0db340

On both cases I used the same computer.

How do I find a way to 'convert'(don't know if this is the proper word) them to the same scale?

If I wasn't clear at the question please notify me.

Audio samples are typically quantized on 16 or 24 bits. But there are different conventions about the range of values these samples can take:

  • if you would quantize on 8 bits, samples would usually be stored as unsigned bytes, ranging from 0 to 255
  • if you would quantize on 16 bits, samples would usually be stored as 2's-complement signed integers, ranging from -32768 to 32767
  • if you would quantize on 24 bits, samples would usually be stored as unsigned integers
  • etc.

Basically, when you decide to store samples, you have two parameters:

  • signed or unsigned
  • int or float

Each has its advantages and drawbacks. For instance, storing in a float in the range [-1, 1] has the advantage that multiplying two samples will always be in the same range of [-1, 1]…

So, to answer your question, you just need to change the format with which you open your PyAudio stream. Currently, you use format=pyaudio.paInt16 . Try to change it pyaudio.paFloat32 and you should get the same data as with your iOS implementation.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM