简体   繁体   English

Python:从STFT重建音频文件

[英]Python: Reconstruct audio file from STFT

As a simple experiment, I want to compute the stft of an audio file: 作为一个简单的实验,我想计算音频文件的stft

sample_rate, samples = wav.read(file)

f, t, Zxx = stft(samples, sample_rate)
_, reconstructed = istft(Zxx, sample_rate)
padded_samples = np.zeros_like(reconstructed)
padded_samples[:len(samples)] = samples
print (np.sum(padded_samples - reconstructed))

Output: -1.37309940428 . 输出: -1.37309940428 Pretty small, isn't it? 很小,不是吗? Given that samples is of shape (9218368,) . 假定samples的形状为(9218368,)

test_file = os.path.join(temp_folder, 'reconstructed.wav')
wav.write(test_file, sample_rate, reconstructed)

The reconstructed file sounds terrible. 重建的文件听起来很糟糕。 The original is barely heareble underneath the noise. 原来在噪音的掩盖下几乎没有。 Have I made a mistake, or is it simply impossible to recover an audio file from the STFT? 我是否犯了一个错误,还是根本无法从STFT恢复音频文件?

Do you have any other suggestions on how to convert an audio file to some kind of processable data and then reconstruct it from that? 关于如何将音频文件转换为某种可处理的数据,然后从中进行重构,您还有其他建议吗? What other kind of data structures can be used to process audio files? 可以使用什么其他类型的数据结构来处理音频文件?

Thank you. 谢谢。

EDIT: 编辑:

As suggested by Warren: 沃伦(Warren)建议:

print (samples.shape)
print (samples.dtype)
print (reconstructed.dtype)

Output: 输出:

(9218368,)
int16
float64

According to the scipy docs int and float input have different meaning when writing a wav file. 根据scipy docs,在编写wav文件时,int和float输入具有不同的含义。 I tried casting reconstructed to np.int16: 我尝试过将投射重构为np.int16:

rounded_reconstructed = np.rint(reconstructed).astype(np.int16)

test_file = os.path.join(temp_folder, 'reconstructed.wav')
wav.write(test_file, sample_rate, rounded_reconstructed)

The result is barely distinguishable from the original. 结果与原始结果几乎没有区别。 Thank you for the help. 感谢您的帮助。

As suggested by Warren : 沃伦(Warren)建议:

print (samples.shape)
print (samples.dtype)
print (reconstructed.dtype)

Output: 输出:

(9218368,)
int16
float64

According to the scipy docs int and float input have different meaning when writing a wav file. 根据scipy docs ,在编写wav文件时, intfloat输入具有不同的含义。 I tried casting reconstructed to np.int16 : 我尝试过将铸造reconstructednp.int16

rounded_reconstructed = np.rint(reconstructed).astype(np.int16)

test_file = os.path.join(temp_folder, 'reconstructed.wav')
wav.write(test_file, sample_rate, rounded_reconstructed)

The result is barely distinguishable from the original. 结果与原始结果几乎没有区别。 Thank you for the help. 感谢您的帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM