简体   繁体   English

具有对数频率轴的 scipy 频谱图?

[英]scipy spectrogram with logarithmic frequency axis?

Playing with scipy.signal.spectrogram.玩 scipy.signal.spectrogram。 Works fine for what it is.工作正常。

from scipy.io import wavfile
from scipy import signal
import numpy as np
import matplotlib.pyplot as plt

sf, audio = wavfile.read('serious.wav')
sig = np.mean(audio, axis=1)
f, t, Sxx = signal.spectrogram(sig, sf, scaling='spectrum')

plt.pcolormesh(t, f, np.log10(Sxx))
plt.ylabel('f [Hz]')
plt.xlabel('t [sec]')
plt.show()

This is the result:这是结果:

频谱图

But the frequency axis is linear.但频率轴是线性的。 For audio this is not often desirable - at any rate, it's not what I want.对于音频,这通常不是可取的 - 无论如何,这不是我想要的。

Is there a way to coax scipy.signal.spectrogram to output a logarithmic frequency scale?有没有办法哄 scipy.signal.spectrogram 输出对数频率标度?

If this is not doable with scipy, could you recommend an equally simple approach to obtain this result?如果这对 scipy 不可行,您能否推荐一种同样简单的方法来获得此结果?


EDIT: The problem is not in the way the image is displayed.编辑:问题不在于图像的显示方式。 The problem is in the way the data is generated by signal.spectrogram()问题在于signal.spectrogram()生成数据的方式

I've changed the code like this:我已经改变了这样的代码:

plt.pcolormesh(t, f, np.log10(Sxx))
plt.ylabel('f [Hz]')
plt.xlabel('t [sec]')
plt.yscale('log')
plt.savefig('spec.png')
plt.show()

And now the image looks like this:现在图像如下所示:

频谱图

The f vector (generated by signal.spectrogram() ) looks like this: f 向量(由signal.spectrogram()生成)如下所示:

array([    0.      ,   172.265625,   344.53125 ,   516.796875,
         689.0625  ,   861.328125,  1033.59375 ,  1205.859375,
        1378.125   ,  1550.390625,  1722.65625 ,  1894.921875,
        2067.1875  ,  2239.453125,  2411.71875 ,  2583.984375,
...
       19982.8125  , 20155.078125, 20327.34375 , 20499.609375,
       20671.875   , 20844.140625, 21016.40625 , 21188.671875,
       21360.9375  , 21533.203125, 21705.46875 , 21877.734375,
       22050.      ])

That's a linear distribution.那是线性分布。 I need far more points in the lower end, and far fewer at the top end.我在低端需要更多的积分,而在高端需要更少的积分。

I've found the problem.我找到了问题所在。 FFT is linear. FFT 是线性的。 My image is logarithmic.我的图像是对数的。 The default interval between frequencies is too big in the lower part of the frequency spectrum.在频谱的下部,频率之间的默认间隔太大。

So I just upped the number of frequency samples via the nperseg parameter.所以我只是通过nperseg参数增加了频率样本的数量。 In this example, the distance between successive frequencies is 1 Hz, which is pretty good resolution.在本例中,连续频率之间的距离为 1 Hz,这是非常好的分辨率。 Also, symlog scaling is best.此外,符号缩放是最好的。

npts = int(sf)
f, t, Sxx = signal.spectrogram(sig, sf, nperseg=npts)
plt.yscale('symlog')

Of course, then there's too many frequencies at the top of the range, so some pruning is required within the f and Sxx arrays (dimensions must match, so prune them both the same way).当然,范围顶部的频率太多,因此需要在 f 和 Sxx 数组中进行一些修剪(维度必须匹配,因此以相同的方式修剪它们)。 Also, the range of displayed frequencies must be limited to 10 - 20000 or some reasonable values.此外,显示频率的范围必须限制在 10 - 20000 或一些合理的值。 All these optimizations are beyond the scope of this answer.所有这些优化都超出了本答案的范围。

But I brought the script to the point where it's usable and I put it on GitHub:但是我把脚本带到了可以使用的地方,然后我把它放在了 GitHub 上:

https://github.com/FlorinAndrei/soundspec https://github.com/FlorinAndrei/soundspec

Here's an example of a working spectrogram:下面是一个工作频谱图的例子:

在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM