[英]Python: Reconstruct audio file from STFT
As a simple experiment, I want to compute the stft of an audio file: 作为一个简单的实验,我想计算音频文件的stft :
sample_rate, samples = wav.read(file)
f, t, Zxx = stft(samples, sample_rate)
_, reconstructed = istft(Zxx, sample_rate)
padded_samples = np.zeros_like(reconstructed)
padded_samples[:len(samples)] = samples
print (np.sum(padded_samples - reconstructed))
Output: -1.37309940428
. 输出:
-1.37309940428
。 Pretty small, isn't it? 很小,不是吗? Given that
samples
is of shape (9218368,)
. 假定
samples
的形状为(9218368,)
。
test_file = os.path.join(temp_folder, 'reconstructed.wav')
wav.write(test_file, sample_rate, reconstructed)
The reconstructed file sounds terrible. 重建的文件听起来很糟糕。 The original is barely heareble underneath the noise.
原来在噪音的掩盖下几乎没有。 Have I made a mistake, or is it simply impossible to recover an audio file from the STFT?
我是否犯了一个错误,还是根本无法从STFT恢复音频文件?
Do you have any other suggestions on how to convert an audio file to some kind of processable data and then reconstruct it from that? 关于如何将音频文件转换为某种可处理的数据,然后从中进行重构,您还有其他建议吗? What other kind of data structures can be used to process audio files?
可以使用什么其他类型的数据结构来处理音频文件?
Thank you. 谢谢。
EDIT: 编辑:
As suggested by Warren: 沃伦(Warren)建议:
print (samples.shape)
print (samples.dtype)
print (reconstructed.dtype)
Output: 输出:
(9218368,)
int16
float64
According to the scipy docs int and float input have different meaning when writing a wav file. 根据scipy docs,在编写wav文件时,int和float输入具有不同的含义。 I tried casting reconstructed to np.int16:
我尝试过将投射重构为np.int16:
rounded_reconstructed = np.rint(reconstructed).astype(np.int16)
test_file = os.path.join(temp_folder, 'reconstructed.wav')
wav.write(test_file, sample_rate, rounded_reconstructed)
The result is barely distinguishable from the original. 结果与原始结果几乎没有区别。 Thank you for the help.
感谢您的帮助。
As suggested by Warren : 沃伦(Warren)建议:
print (samples.shape)
print (samples.dtype)
print (reconstructed.dtype)
Output: 输出:
(9218368,)
int16
float64
According to the scipy docs int
and float
input have different meaning when writing a wav file. 根据scipy docs ,在编写wav文件时,
int
和float
输入具有不同的含义。 I tried casting reconstructed
to np.int16
: 我尝试过将铸造
reconstructed
为np.int16
rounded_reconstructed = np.rint(reconstructed).astype(np.int16)
test_file = os.path.join(temp_folder, 'reconstructed.wav')
wav.write(test_file, sample_rate, rounded_reconstructed)
The result is barely distinguishable from the original. 结果与原始结果几乎没有区别。 Thank you for the help.
感谢您的帮助。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.