Python：从STFT重建音频文件

Question

As a simple experiment, I want to compute the stft of an audio file: 作为一个简单的实验，我想计算音频文件的stft ：

sample_rate, samples = wav.read(file)

f, t, Zxx = stft(samples, sample_rate)
_, reconstructed = istft(Zxx, sample_rate)
padded_samples = np.zeros_like(reconstructed)
padded_samples[:len(samples)] = samples
print (np.sum(padded_samples - reconstructed))

Output: -1.37309940428 . 输出： -1.37309940428 。 Pretty small, isn't it? 很小，不是吗？ Given that samples is of shape (9218368,) . 假定samples的形状为(9218368,) 。

test_file = os.path.join(temp_folder, 'reconstructed.wav')
wav.write(test_file, sample_rate, reconstructed)

The reconstructed file sounds terrible. 重建的文件听起来很糟糕。 The original is barely heareble underneath the noise. 原来在噪音的掩盖下几乎没有。 Have I made a mistake, or is it simply impossible to recover an audio file from the STFT? 我是否犯了一个错误，还是根本无法从STFT恢复音频文件？

Do you have any other suggestions on how to convert an audio file to some kind of processable data and then reconstruct it from that? 关于如何将音频文件转换为某种可处理的数据，然后从中进行重构，您还有其他建议吗？ What other kind of data structures can be used to process audio files? 可以使用什么其他类型的数据结构来处理音频文件？

Thank you. 谢谢。

EDIT: 编辑：

As suggested by Warren: 沃伦（Warren）建议：

print (samples.shape)
print (samples.dtype)
print (reconstructed.dtype)

Output: 输出：

(9218368,)
int16
float64

According to the scipy docs int and float input have different meaning when writing a wav file. 根据scipy docs，在编写wav文件时，int和float输入具有不同的含义。 I tried casting reconstructed to np.int16: 我尝试过将投射重构为np.int16：

rounded_reconstructed = np.rint(reconstructed).astype(np.int16)

test_file = os.path.join(temp_folder, 'reconstructed.wav')
wav.write(test_file, sample_rate, rounded_reconstructed)

The result is barely distinguishable from the original. 结果与原始结果几乎没有区别。 Thank you for the help. 感谢您的帮助。

Answer 1

As suggested by Warren : 沃伦（Warren）建议：

print (samples.shape)
print (samples.dtype)
print (reconstructed.dtype)

Output: 输出：

(9218368,)
int16
float64

According to the scipy docs int and float input have different meaning when writing a wav file. 根据scipy docs ，在编写wav文件时， int和float输入具有不同的含义。 I tried casting reconstructed to np.int16 : 我尝试过将铸造reconstructed为np.int16

rounded_reconstructed = np.rint(reconstructed).astype(np.int16)

test_file = os.path.join(temp_folder, 'reconstructed.wav')
wav.write(test_file, sample_rate, rounded_reconstructed)

The result is barely distinguishable from the original. 结果与原始结果几乎没有区别。 Thank you for the help. 感谢您的帮助。

Python：从STFT重建音频文件

问题描述

1 个解决方案

解决方案1
0 已采纳 2017-12-27 10:23:48

Python：从STFT重建音频文件

问题描述

1 个解决方案

解决方案1 0 已采纳 2017-12-27 10:23:48

解决方案1
0 已采纳 2017-12-27 10:23:48