I am trying to train some models on audio data. I wrote some code to load some mp3 files, split them up into short pieces (about 0.1 second each) and analyze these pieces in batches. So, I wrote this code.
import glob
import tensorflow as tf
from tensorflow.contrib import ffmpeg
def load(fname):
binary = tf.read_file(fname)
return ffmpeg.decode_audio(binary, file_format='mp3', samples_per_second=44100, channel_count=2)
def preprocess(audio, seconds_per_sample=0.1, rate=44100):
# pad to a with 1 second of silence front and back
front = tf.zeros([rate, 2], dtype=audio.dtype)
back = tf.zeros([rate - tf.mod(tf.shape(audio)[0], rate) + rate, 2], dtype=audio.dtype)
audio = tf.concat([front, audio, back], 0)
# normalize to 0 to 1 range
audio = tf.add(audio, tf.abs(tf.reduce_min(audio)))
audio = tf.multiply(audio, 1.0 / tf.reduce_max(audio))
# [data, channels] => [samples, data, channels]
audio = tf.reshape(audio, [-1, int(rate * seconds_per_sample), 2])
return audio
tf.reset_default_graph()
with tf.Graph().as_default():
# take files one by one and read data from them
files = glob.glob('music/*.mp3')
queue = tf.train.string_input_producer(files, num_epochs=1)
fname = queue.dequeue()
audio = load(fname)
audio = preprocess(audio)
samples = tf.train.slice_input_producer([audio], num_epochs=1)
batch = tf.train.batch(samples, 10)
model = tf.identity(batch)
init = [tf.global_variables_initializer(), tf.local_variables_initializer()]
coord = tf.train.Coordinator()
with tf.Session() as session:
session.run(init)
threads = tf.train.start_queue_runners(sess=session, coord=coord)
for _ in range(10):
try:
result = session.run(model)
except tf.errors.OutOfRangeError:
coord.request_stop()
coord.request_stop()
coord.join(threads)
It seems pretty straight forward an similar approaches worked for me for my previous models. I reshape the audio data, so the first dimension becomes samples, use the slice input to queue samples up and then use batch() to feed the samples 10 at a time into the model. For simplicity, I left the model as an identity function. This code makes my python segfault somewhere deep inside tensorflow. Is there anything I am doing obviously wrong?
Here is the start of OSX crash report
Process: Python [57865]
Path: /usr/local/Cellar/python3/3.6.1/Frameworks/Python.framework/Versions/3.6/Resources/Python.app/Contents/MacOS/Python
Identifier: Python
Version: 3.6.1 (3.6.1)
Code Type: X86-64 (Native)
Parent Process: Python [57654]
Responsible: Python [57865]
User ID: 502
Date/Time: 2017-04-12 16:07:13.318 -0400
OS Version: Mac OS X 10.12.3 (16D32)
Report Version: 12
Anonymous UUID: B5DE676B-FEC7-9626-B1CC-F392948D410C
Sleep/Wake UUID: F3A5360E-B7A0-4675-9DC9-EAEE938E2E70
Time Awake Since Boot: 440000 seconds
Time Since Wake: 16000 seconds
System Integrity Protection: disabled
Crashed Thread: 16
Exception Type: EXC_CRASH (SIGABRT)
Exception Codes: 0x0000000000000000, 0x0000000000000000
Exception Note: EXC_CORPSE_NOTIFY
Application Specific Information:
abort() called
Edit: the issue I opened on GitHub was closed with no explanation, but "see issue tracker policy". I am not sure what else I can do here. If anyone has any light to shed on this problem, please do.
Before running the code You have posted on my computer, I had to add some MP3-file to "music" folder. I assume You have some audio there, but also, please pay attention to ffmpeg binary. Tensorflow asks for ffmpeg
to be in /usr/local/sbin/
folder.
A usual symbolic link worked for me.
ln -s /usr/bin/ffmpeg /usr/local/sbin/ffmpeg
If this answer is not helpful, then, please, provide more information by running the code in a terminal emulator and posting the traceback here.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.