简体   繁体   中英

Using tf.keras.utils.Sequence with model.fit_generator with use_multiprocessing=True generated warning

This is the warning I got:

WARNING:tensorflow:multiprocessing can interact badly with TensorFlow, causing nondeterministic deadlocks. For high performance data pipelines tf.data is recommended.

The Sequence subclass I wrote strictly perform load and read I/O jpg files. I guess as long as no 2 threads do it simultaneously on the same file, things should be ok.

I trained for a few epoch and so far, there's no error. But would like to get feedback if there's something potentially bad that could happen.

Initially in the TensorFlow 2.0 Version, there were issues with the keras.utils.Sequence with multiprocessing=True was causing a hang due to deadlock. Later in Tensorflow 2.1 this Warning was added to address this concern.

# use_multiprocessing=False works.
# use_multiprocessing=True hangs in a deadlock situation.
model.fit_generator(generator, use_multiprocessing=True, workers=2)  

You can ignore this warning since you are not doing any processing which will create deadlock situation.

In general, a necessary condition for deadlocks is that a process has exclusive access to a resource, and is waiting for another ( source )

That means that if your Sequence (or generator) class only holds access to a single resource (eg a .jpg image file), then no deadlock can occur. Also, if you are reading data from memory, with no locking (eg read-only data), no deadlock can occur (due to lack of exclusivity).

In other words: The warning probably does not apply, unless you are reading or modifying multiple data, in a thread-safe manner, in your Sequence , or generator.

I trained two models with the same hyper-parameteres, one is classifcation and the other is regression model. The classification model is trained with multi_processing=True without any warning, while training regression model with mse loss, it gives the warning and slowely consumes whole memory and systems hangs. So, I had to turn off multi_processing and it worked consuming almost 50% of the system memory. Note that the same batch size was used for traning both models although the classifcation model has more parameters than regression model.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM