简体   繁体   中英

How do I know which spectrogram frames belong to which audio samples?

I've been using this script:

spgram = torchaudio.transforms.Spectrogram(512, hop_length=32)
audio = spgram(audio)

to get the spectrogram of some stereo music audio. I expected that the resulting spectrogram has the shape [2, 257, audio.shape[1]/32] However, that's not the case. For examples, an audio clip with size [2, 199488] (with sr=24576) yields a spectrogram with size [2, 257, 6241] (note that 199488/32=6234). Why is that? and how can I convert from frame location to sample location?

See center parameter.

whether to pad waveform on both sides so that the t -th frame is centered at time tx hop_length. (Default: True )

So, by default, the signal is padded with zeros. The padding length is probably ( win_length - hop_length ). This ends up making the result longer by (win_length - hop_length) / hop_length , which is 7 in your case.

Thanks for your answers. If I have a signal x with the size of [1,128000], it is 800 frames. torch.stft(x).size() = [1,201,801,2]. I want to align the frames of torch.stft(x) to 800 frames. Can I lose the last frame, only keep the first 800 frames?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM