简体   繁体   English

如何拉伸 matplotlib 频谱图的 x 轴?

[英]How do I stretch the x-axis of a matplotlib spectrogram?

Sorry if this is a really obvious question.对不起,如果这是一个非常明显的问题。 I am using matplotlib to generate some spectrograms for use as training data in a machine learning model.我正在使用 matplotlib 生成一些频谱图,用作机器学习模型中的训练数据。 The spectrograms are of short clips of music and I want to simulate speeding up or slowing down the song by a random amount to create variations in the data.频谱图是音乐的短片,我想模拟以随机量加速或减慢歌曲以在数据中产生变化。 I have shown my code below for generating each spectrogram.我在下面展示了我的代码来生成每个频谱图。 I have temporarily modified it to produce 2 images starting at the same point in the song, one with variation and one without, in order to compare them and see if it is working as intended.我临时修改了它以在歌曲的同一点开始生成 2 个图像,一个有变化,一个没有,以便比较它们并看看它是否按预期工作。

from pydub import AudioSegment
import matplotlib.pyplot as plt
import numpy as np

BPM_VARIATION_AMOUNT = 0.2
FRAME_RATE = 22050
CHUNK_SIZE = 2
BUFFER = FRAME_RATE * 5

def generate_random_specgram(track):
    # Read audio data from file
    audio = AudioSegment.from_file(track.location)
    audio = audio.set_channels(1).set_frame_rate(FRAME_RATE)
    samples = audio.get_array_of_samples()
    start = np.random.randint(BUFFER, len(samples) - BUFFER)
    chunk = samples[start:start + int(CHUNK_SIZE * FRAME_RATE)]

    # Plot specgram and save to file
    filename = ('specgrams/%s-%s-%s.png' % (track.trackid, start, track.bpm))
    plt.figure(figsize=(2.56, 0.64), frameon=False).add_axes([0, 0, 1, 1])
    plt.axis('off')
    plt.specgram(chunk, Fs = FRAME_RATE)
    plt.savefig(filename)
    plt.close()

    # Perform random variations to the BPM
    frame_rate = FRAME_RATE
    bpm = track.bpm
    variation = 1 - BPM_VARIATION_AMOUNT + (
        np.random.random() * BPM_VARIATION_AMOUNT * 2)
    bpm *= variation
    bpm = round(bpm, 2)
    # I thought this next line should have been /= but that stretched the wrong way?
    frame_rate *= (bpm / track.bpm) 

    # Read audio data from file
    chunk = samples[start:start + int(CHUNK_SIZE * frame_rate)]

    # Plot specgram and save to file
    filename = ('specgrams/%s-%s-%s.png' % (track.trackid, start, bpm))
    plt.figure(figsize=(2.56, 0.64), frameon=False).add_axes([0, 0, 1, 1])
    plt.axis('off')
    plt.specgram(chunk, Fs = frame_rate)
    plt.savefig(filename)
    plt.close()

I thought by changing the Fs parameter given to the specgram function this would stretch the data along the x-axis but instead it seems to be resizing the whole graph and introducing white space at the top of the image in strange and unpredictable ways.我认为通过更改赋予 specgram 函数的 Fs 参数,这将沿 x 轴拉伸数据,但它似乎正在调整整个图形的大小并以奇怪且不可预测的方式在图像顶部引入空白。 I'm sure I'm missing something but I can't see what it is.我确定我遗漏了一些东西,但我看不到它是什么。 Below is an image to illustrate what I'm getting.下面是一张图片来说明我得到了什么。

频谱图示例

The framerate is a fixed number that only depends on your data, if you change it you will effectively "stretch" the x-axis but in the wrong way.帧率是一个固定数字,仅取决于您的数据,如果您更改它,您将有效地“拉伸”x 轴,但方式错误。 For example, if you have 1000 data points that correspond to 1 second, your framerate (or better sampling frequency) will be 1000. If your signal is a simple 200Hz sine which slightly increases the frequency in time, the specgram will be:例如,如果您有 1000 个数据点对应于 1 秒,则您的帧率(或更好的采样频率)将为 1000。如果您的信号是一个简单的 200Hz 正弦波,随着时间的推移略微增加频率,则specgram将是:

t = np.linspace(0, 1, 1000)
signal = np.sin((200*2*np.pi + 200*t) * t)

frame_rate = 1000
plt.specgram(signal, Fs=frame_rate);

在此处输入图片说明

If you change the framerate you will have a wrong x and y-axis scale.如果您更改帧速率,您将获得错误的 x 和 y 轴比例。 If you set the framerate to be 500 you will have:如果您将帧率设置为 500,您将拥有:

t = np.linspace(0, 1, 1000)
signal = np.sin((200*2*np.pi + 200*t) * t)

frame_rate = 500
plt.specgram(signal, Fs=frame_rate);

在此处输入图片说明

The plot is very similar, but this time is wrong: you have almost 2 seconds on the x-axis, while you should only have 1, moreover, the starting frequency you read is 100Hz instead of 200Hz.情节非常相似,但这次是错误的:您在x轴上有将近2秒,而您应该只有1秒,而且您读取的起始频率是100Hz而不是200Hz。


To conclude, the sampling frequency you set needs to be the correct one.总而言之,您设置的采样频率必须是正确的。 If you want to stretch the plot you can use something like plt.xlim(0.2, 0.4) .如果你想拉伸情节,你可以使用类似plt.xlim(0.2, 0.4) If you want to avoid the white band on top of the plot you can manually set the ylim to be half the frame rate:如果您想避免绘图顶部的白色带,您可以手动将ylim设置为帧速率的一半:

plt.ylim(0, frame_rate/2)

This works because of simple properties of the Fourier transform and Nyquist-Shannon theorem .这是因为傅里叶变换和Nyquist-Shannon 定理的简单属性。

The solution to my problem was to set the xlim and ylim of the plot.我的问题的解决方案是设置情节的 xlim 和 ylim。 Here is the code from my testing file in which I finally got rid of all the odd whitespace:这是我的测试文件中的代码,我最终摆脱了所有奇怪的空格:

from pydub import AudioSegment
import numpy as np
import matplotlib.pyplot as plt

BUFFER = 5
FRAME_RATE = 22050
SAMPLE_LENGTH = 2

def plot(audio_file, bpm, variation=1):
    audio = AudioSegment.from_file(audio_file)
    audio = audio.set_channels(1).set_frame_rate(FRAME_RATE)
    samples = audio.get_array_of_samples()
    chunk_length = int(FRAME_RATE * SAMPLE_LENGTH * variation)
    start = np.random.randint(
        BUFFER * FRAME_RATE,
        len(samples) - (BUFFER * FRAME_RATE) - chunk_length)
    chunk = samples[start:start + chunk_length]

    plt.figure(figsize=(5.12, 2.56)).add_axes([0, 0, 1, 1])
    plt.specgram(chunk, Fs=FRAME_RATE * variation)
    plt.xlim(0, SAMPLE_LENGTH)
    plt.ylim(0, FRAME_RATE / 2 * variation)
    plt.savefig('specgram-%f.png' % (bpm * variation))
    plt.close()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM