I use the mirrored strategy in my Tensorflow2 code, as described in this tutorial: https://www.tensorflow.org/guide/distributed_training . I have almost the same exact code, and the setup is working well for about 1.5 years now. I regularly put the function call
@tf.function
def distributed_train_step(dist_inputs):
in the eager mode for debugging purposes by simply commenting the @tf.function, worked well until now. This morning when I started the debugger, I got the following error message: Process finished with exit code 139 (interrupted by signal 11: SIGSEGV). When I put the @tf.function in again, everything works well, it's just in the eager mode. I did even reset all my code and restored an old git commit which I know is working perfectly fine. Can someone explain why this error suddenly occurs in eager mode? I'm a bit lost here..
Can someone explain why this error suddenly occurs in eager mode?
It's a bug in TF, possibly the one fixed by this commit .
But it's hard to tell without a crash stack trace.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.