简体   繁体   English

Python中的可逆STFT和ISTFT

[英]Invertible STFT and ISTFT in Python

Is there any general-purpose form of short-time Fourier transform with corresponding inverse transform built into SciPy or NumPy or whatever? 是否有任何通用形式的短时傅里叶变换,其中相应的逆变换内置于SciPy或NumPy或其他任何东西?

There's the pyplot specgram function in matplotlib, which calls ax.specgram() , which calls mlab.specgram() , which calls _spectral_helper() : 在matplotlib中有pyplot specgram函数,它调用ax.specgram() ,它调用mlab.specgram() ,调用_spectral_helper()

 #The checks for if y is x are so that we can use the same function to #implement the core of psd(), csd(), and spectrogram() without doing #extra calculations. We return the unaveraged Pxy, freqs, and t. 

but

This is a helper function that implements the commonality between the 204 #psd, csd, and spectrogram. 这是一个辅助函数,它实现了204#psd,csd和谱图之间的通用性。 It is NOT meant to be used outside of mlab 并不意味着在mlab之外使用

I'm not sure if this can be used to do an STFT and ISTFT, though. 不过,我不确定这是否可以用来做STFT和ISTFT。 Is there anything else, or should I translate something like these MATLAB functions ? 还有什么,或者我应该翻译这些MATLAB函数吗?

I know how to write my own ad-hoc implementation; 我知道如何编写自己的临时实现; I'm just looking for something full-featured, which can handle different windowing functions (but has a sane default), is fully invertible with COLA windows ( istft(stft(x))==x ), tested by multiple people, no off-by-one errors, handles the ends and zero padding well, fast RFFT implementation for real input, etc. 我只是在寻找功能齐全的东西,它可以处理不同的窗口函数(但是有一个合理的默认值),完全可以与COLA窗口( istft(stft(x))==x )完全istft(stft(x))==x ,由多人测试,没有逐个错误,处理结束和零填充,实际输入的快速RFFT实现等。

Here is my Python code, simplified for this answer: 这是我的Python代码,简化了这个答案:

import scipy, pylab

def stft(x, fs, framesz, hop):
    framesamp = int(framesz*fs)
    hopsamp = int(hop*fs)
    w = scipy.hanning(framesamp)
    X = scipy.array([scipy.fft(w*x[i:i+framesamp]) 
                     for i in range(0, len(x)-framesamp, hopsamp)])
    return X

def istft(X, fs, T, hop):
    x = scipy.zeros(T*fs)
    framesamp = X.shape[1]
    hopsamp = int(hop*fs)
    for n,i in enumerate(range(0, len(x)-framesamp, hopsamp)):
        x[i:i+framesamp] += scipy.real(scipy.ifft(X[n]))
    return x

Notes: 笔记:

  1. The list comprehension is a little trick I like to use to simulate block processing of signals in numpy/scipy. 列表理解是一个小技巧,我喜欢用来模拟numpy / scipy中信号的块处理。 It's like blkproc in Matlab. 这就像Matlab中的blkproc Instead of a for loop, I apply a command (eg, fft ) to each frame of the signal inside a list comprehension, and then scipy.array casts it to a 2D-array. 我将命令(例如, fft )应用于列表scipy.array每个信号帧,而不是for循环,然后scipy.array将其转换为2D数组。 I use this to make spectrograms, chromagrams, MFCC-grams, and much more. 我用它来制作光谱图,色谱图,MFCC-gram等等。
  2. For this example, I use a naive overlap-and-add method in istft . 对于这个例子,我在istft使用了一个天真的重叠和添加方法。 In order to reconstruct the original signal the sum of the sequential window functions must be constant, preferably equal to unity (1.0). 为了重建原始信号,顺序窗口函数的总和必须是常数,优选地等于1(1.0)。 In this case, I've chosen the Hann (or hanning ) window and a 50% overlap which works perfectly. 在这种情况下,我选择了Hann(或hanning )窗口和50%的重叠,完美无缺。 See this discussion for more information. 有关更多信息,请参阅此讨论
  3. There are probably more principled ways of computing the ISTFT. 可能有更多有原则的计算ISTFT的方法。 This example is mainly meant to be educational. 这个例子主要是教育性的。

A test: 一个测试:

if __name__ == '__main__':
    f0 = 440         # Compute the STFT of a 440 Hz sinusoid
    fs = 8000        # sampled at 8 kHz
    T = 5            # lasting 5 seconds
    framesz = 0.050  # with a frame size of 50 milliseconds
    hop = 0.025      # and hop size of 25 milliseconds.

    # Create test signal and STFT.
    t = scipy.linspace(0, T, T*fs, endpoint=False)
    x = scipy.sin(2*scipy.pi*f0*t)
    X = stft(x, fs, framesz, hop)

    # Plot the magnitude spectrogram.
    pylab.figure()
    pylab.imshow(scipy.absolute(X.T), origin='lower', aspect='auto',
                 interpolation='nearest')
    pylab.xlabel('Time')
    pylab.ylabel('Frequency')
    pylab.show()

    # Compute the ISTFT.
    xhat = istft(X, fs, T, hop)

    # Plot the input and output signals over 0.1 seconds.
    T1 = int(0.1*fs)

    pylab.figure()
    pylab.plot(t[:T1], x[:T1], t[:T1], xhat[:T1])
    pylab.xlabel('Time (seconds)')

    pylab.figure()
    pylab.plot(t[-T1:], x[-T1:], t[-T1:], xhat[-T1:])
    pylab.xlabel('Time (seconds)')

440 Hz正弦曲线的STFT440 Hz正弦波开始的ISTFTISTFT结束440赫兹正弦曲线

Here is the STFT code that I use. 这是我使用的STFT代码。 STFT + ISTFT here gives perfect reconstruction (even for the first frames). STFT + ISTFT在这里提供了完美的重建 (即使是第一帧)。 I slightly modified the code given here by Steve Tjoa : here the magnitude of the reconstructed signal is the same as that of the input signal. 我略微修改了Steve Tjoa给出的代码:这里重建信号的幅度与输入信号的幅度相同。

import scipy, numpy as np

def stft(x, fftsize=1024, overlap=4):   
    hop = fftsize / overlap
    w = scipy.hanning(fftsize+1)[:-1]      # better reconstruction with this trick +1)[:-1]  
    return np.array([np.fft.rfft(w*x[i:i+fftsize]) for i in range(0, len(x)-fftsize, hop)])

def istft(X, overlap=4):   
    fftsize=(X.shape[1]-1)*2
    hop = fftsize / overlap
    w = scipy.hanning(fftsize+1)[:-1]
    x = scipy.zeros(X.shape[0]*hop)
    wsum = scipy.zeros(X.shape[0]*hop) 
    for n,i in enumerate(range(0, len(x)-fftsize, hop)): 
        x[i:i+fftsize] += scipy.real(np.fft.irfft(X[n])) * w   # overlap-add
        wsum[i:i+fftsize] += w ** 2.
    pos = wsum != 0
    x[pos] /= wsum[pos]
    return x

librosa.core.stft and istft look pretty similar to what I was looking for, though they didn't exist at the time: librosa.core.stftistft看起来与我想要的非常相似,尽管它们当时不存在:

librosa.core.stft(y, n_fft=2048, hop_length=None, win_length=None, window=None, center=True, dtype=<type 'numpy.complex64'>)

They don't invert exactly, though; 但它们并没有完全颠倒; the ends are tapered. 两端是锥形的。

我有点晚了,但是实现scipy有0.19.0的内置istft功能

Neither of the above answers worked well OOTB for me. 上述答案都不适合OOTB。 So I modified Steve Tjoa's. 所以我修改了Steve Tjoa的。

import scipy, pylab
import numpy as np

def stft(x, fs, framesz, hop):
    """
     x - signal
     fs - sample rate
     framesz - frame size
     hop - hop size (frame size = overlap + hop size)
    """
    framesamp = int(framesz*fs)
    hopsamp = int(hop*fs)
    w = scipy.hamming(framesamp)
    X = scipy.array([scipy.fft(w*x[i:i+framesamp]) 
                     for i in range(0, len(x)-framesamp, hopsamp)])
    return X

def istft(X, fs, T, hop):
    """ T - signal length """
    length = T*fs
    x = scipy.zeros(T*fs)
    framesamp = X.shape[1]
    hopsamp = int(hop*fs)
    for n,i in enumerate(range(0, len(x)-framesamp, hopsamp)):
        x[i:i+framesamp] += scipy.real(scipy.ifft(X[n]))
    # calculate the inverse envelope to scale results at the ends.
    env = scipy.zeros(T*fs)
    w = scipy.hamming(framesamp)
    for i in range(0, len(x)-framesamp, hopsamp):
        env[i:i+framesamp] += w
    env[-(length%hopsamp):] += w[-(length%hopsamp):]
    env = np.maximum(env, .01)
    return x/env # right side is still a little messed up...

Found another STFT, but no corresponding inverse function: 找到另一个STFT,但没有相应的反函数:

http://code.google.com/p/pytfd/source/browse/trunk/pytfd/stft.py http://code.google.com/p/pytfd/source/browse/trunk/pytfd/stft.py

def stft(x, w, L=None):
    ...
    return X_stft
  • w is a window function as an array w是一个窗口函数作为数组
  • L is the overlap, in samples L是样本中的重叠

I also found this on GitHub, but it seems to operate on pipelines instead of normal arrays: 我也在GitHub上发现了这个,但它似乎在管道而不是普通数组上运行:

http://github.com/ronw/frontend/blob/master/basic.py#LID281 http://github.com/ronw/frontend/blob/master/basic.py#LID281

def STFT(nfft, nwin=None, nhop=None, winfun=np.hanning):
    ...
    return dataprocessor.Pipeline(Framer(nwin, nhop), Window(winfun),
                                  RFFT(nfft))


def ISTFT(nfft, nwin=None, nhop=None, winfun=np.hanning):
    ...
    return dataprocessor.Pipeline(IRFFT(nfft), Window(winfun),
                                  OverlapAdd(nwin, nhop))

I think scipy.signal has what you are looking for. 我认为scipy.signal有你想要的东西。 It has reasonable defaults, supports multiple window types, etc... 它有合理的默认值,支持多种窗口类型等...

http://docs.scipy.org/doc/scipy-0.17.0/reference/generated/scipy.signal.spectrogram.html http://docs.scipy.org/doc/scipy-0.17.0/reference/generated/scipy.signal.spectrogram.html

from scipy.signal import spectrogram
freq, time, Spec = spectrogram(signal)

A fixed version of basj's answer. basj答案的固定版本。

import scipy, numpy as np
import matplotlib.pyplot as plt

def stft(x, fftsize=1024, overlap=4):
    hop=fftsize//overlap
    w = scipy.hanning(fftsize+1)[:-1]      # better reconstruction with this trick +1)[:-1]  
    return np.vstack([np.fft.rfft(w*x[i:i+fftsize]) for i in range(0, len(x)-fftsize, hop)])

def istft(X, overlap=4):   
    fftsize=(X.shape[1]-1)*2
    hop=fftsize//overlap
    w=scipy.hanning(fftsize+1)[:-1]
    rcs=int(np.ceil(float(X.shape[0])/float(overlap)))*fftsize
    print(rcs)
    x=np.zeros(rcs)
    wsum=np.zeros(rcs)
    for n,i in zip(X,range(0,len(X)*hop,hop)): 
        l=len(x[i:i+fftsize])
        x[i:i+fftsize] += np.fft.irfft(n).real[:l]   # overlap-add
        wsum[i:i+fftsize] += w[:l]
    pos = wsum != 0
    x[pos] /= wsum[pos]
    return x

a=np.random.random((65536))
b=istft(stft(a))
plt.plot(range(len(a)),a,range(len(b)),b)
plt.show()

如果您可以访问可以执行所需操作的C二进制库,请使用http://code.google.com/p/ctypesgen/生成该库的Python界面。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM