在 tensorflow 中批量處理音頻數據

Question

我正在嘗試在音頻數據上訓練一些模型。 我編寫了一些代碼來加載一些 mp3 文件，將它們分成短片（每個大約 0.1 秒）並分批分析這些片斷。 所以，我寫了這段代碼。

import glob
import tensorflow as tf
from tensorflow.contrib import ffmpeg

def load(fname):
    binary = tf.read_file(fname)
    return ffmpeg.decode_audio(binary, file_format='mp3', samples_per_second=44100, channel_count=2)   

def preprocess(audio, seconds_per_sample=0.1, rate=44100):
    # pad to a with 1 second of silence front and back
    front = tf.zeros([rate, 2], dtype=audio.dtype)
    back = tf.zeros([rate - tf.mod(tf.shape(audio)[0], rate) + rate, 2], dtype=audio.dtype)
    audio = tf.concat([front, audio, back], 0)
    # normalize to 0 to 1 range
    audio = tf.add(audio, tf.abs(tf.reduce_min(audio)))
    audio = tf.multiply(audio, 1.0 / tf.reduce_max(audio))
    # [data, channels] => [samples, data, channels]
    audio = tf.reshape(audio, [-1, int(rate * seconds_per_sample), 2])
    return audio

tf.reset_default_graph()
with tf.Graph().as_default():
    # take files one by one and read data from them
    files = glob.glob('music/*.mp3')    
    queue = tf.train.string_input_producer(files, num_epochs=1)
    fname = queue.dequeue()
    audio = load(fname)
    audio = preprocess(audio)
    samples = tf.train.slice_input_producer([audio], num_epochs=1)
    batch = tf.train.batch(samples, 10)

    model = tf.identity(batch)

    init = [tf.global_variables_initializer(), tf.local_variables_initializer()]

    coord = tf.train.Coordinator()

    with tf.Session() as session:
        session.run(init)
        threads = tf.train.start_queue_runners(sess=session, coord=coord)
        for _ in range(10):
            try:
                result = session.run(model)
            except tf.errors.OutOfRangeError:
                coord.request_stop()
        coord.request_stop()
        coord.join(threads)

對於我以前的模型，類似的方法似乎很簡單。 我重塑了音頻數據，因此第一個維度成為樣本，使用切片輸入將樣本排隊，然后使用 batch() 一次將 10 個樣本送入模型。 為簡單起見，我將模型保留為恆等函數。 這段代碼使我的 python segfault 出現在 tensorflow 深處的某個地方。 有什么我做的明顯錯誤嗎？

這是 OSX 崩潰報告的開始

Process:               Python [57865]
Path:                  /usr/local/Cellar/python3/3.6.1/Frameworks/Python.framework/Versions/3.6/Resources/Python.app/Contents/MacOS/Python
Identifier:            Python
Version:               3.6.1 (3.6.1)
Code Type:             X86-64 (Native)
Parent Process:        Python [57654]
Responsible:           Python [57865]
User ID:               502

Date/Time:             2017-04-12 16:07:13.318 -0400
OS Version:            Mac OS X 10.12.3 (16D32)
Report Version:        12
Anonymous UUID:        B5DE676B-FEC7-9626-B1CC-F392948D410C

Sleep/Wake UUID:       F3A5360E-B7A0-4675-9DC9-EAEE938E2E70

Time Awake Since Boot: 440000 seconds
Time Since Wake:       16000 seconds

System Integrity Protection: disabled

Crashed Thread:        16

Exception Type:        EXC_CRASH (SIGABRT)
Exception Codes:       0x0000000000000000, 0x0000000000000000
Exception Note:        EXC_CORPSE_NOTIFY

Application Specific Information:
abort() called

編輯：我在 GitHub 上打開的問題已關閉，沒有任何解釋，但“請參閱問題跟蹤政策”。 我不知道我還能在這里做什么。 如果有人對這個問題有任何了解，請這樣做。

Answer 1

在運行您在我的計算機上發布的代碼之前，我必須將一些 MP3 文件添加到“音樂”文件夾中。 我假設您那里有一些音頻，但也請注意 ffmpeg 二進制文件。 Tensorflow 要求ffmpeg位於/usr/local/sbin/文件夾中。

快速解決方法

通常的符號鏈接對我有用。

ln -s /usr/bin/ffmpeg /usr/local/sbin/ffmpeg

如果此答案沒有幫助，那么請通過在終端模擬器中運行代碼並在此處發布回溯來提供更多信息。

在 tensorflow 中批量處理音頻數據

問題描述

1 個解決方案

解決方案1
0 2017-06-13 15:43:12

快速解決方法

在 tensorflow 中批量處理音頻數據

問題描述

1 個解決方案

解決方案1 0 2017-06-13 15:43:12

快速解決方法

解決方案1
0 2017-06-13 15:43:12