Python：從STFT重建音頻文件

Question

作為一個簡單的實驗，我想計算音頻文件的stft ：

sample_rate, samples = wav.read(file)

f, t, Zxx = stft(samples, sample_rate)
_, reconstructed = istft(Zxx, sample_rate)
padded_samples = np.zeros_like(reconstructed)
padded_samples[:len(samples)] = samples
print (np.sum(padded_samples - reconstructed))

輸出： -1.37309940428 。 很小，不是嗎？ 假定samples的形狀為(9218368,) 。

test_file = os.path.join(temp_folder, 'reconstructed.wav')
wav.write(test_file, sample_rate, reconstructed)

重建的文件聽起來很糟糕。 原來在噪音的掩蓋下幾乎沒有。 我是否犯了一個錯誤，還是根本無法從STFT恢復音頻文件？

關於如何將音頻文件轉換為某種可處理的數據，然后從中進行重構，您還有其他建議嗎？ 可以使用什么其他類型的數據結構來處理音頻文件？

謝謝。

編輯：

沃倫（Warren）建議：

print (samples.shape)
print (samples.dtype)
print (reconstructed.dtype)

輸出：

(9218368,)
int16
float64

根據scipy docs，在編寫wav文件時，int和float輸入具有不同的含義。 我嘗試過將投射重構為np.int16：

rounded_reconstructed = np.rint(reconstructed).astype(np.int16)

test_file = os.path.join(temp_folder, 'reconstructed.wav')
wav.write(test_file, sample_rate, rounded_reconstructed)

結果與原始結果幾乎沒有區別。 感謝您的幫助。

Answer 1

沃倫（Warren）建議：

print (samples.shape)
print (samples.dtype)
print (reconstructed.dtype)

輸出：

(9218368,)
int16
float64

根據scipy docs ，在編寫wav文件時， int和float輸入具有不同的含義。 我嘗試過將鑄造reconstructed為np.int16

rounded_reconstructed = np.rint(reconstructed).astype(np.int16)

test_file = os.path.join(temp_folder, 'reconstructed.wav')
wav.write(test_file, sample_rate, rounded_reconstructed)

結果與原始結果幾乎沒有區別。 感謝您的幫助。

Python：從STFT重建音頻文件

問題描述

1 個解決方案

解決方案1
0 已采納 2017-12-27 10:23:48

Python：從STFT重建音頻文件

問題描述

1 個解決方案

解決方案1 0 已采納 2017-12-27 10:23:48

解決方案1
0 已采納 2017-12-27 10:23:48