简体   繁体   中英

segmentation fault in tensorflow eager execution?

I use the mirrored strategy in my Tensorflow2 code, as described in this tutorial: https://www.tensorflow.org/guide/distributed_training . I have almost the same exact code, and the setup is working well for about 1.5 years now. I regularly put the function call

@tf.function
def distributed_train_step(dist_inputs):

in the eager mode for debugging purposes by simply commenting the @tf.function, worked well until now. This morning when I started the debugger, I got the following error message: Process finished with exit code 139 (interrupted by signal 11: SIGSEGV). When I put the @tf.function in again, everything works well, it's just in the eager mode. I did even reset all my code and restored an old git commit which I know is working perfectly fine. Can someone explain why this error suddenly occurs in eager mode? I'm a bit lost here..

Can someone explain why this error suddenly occurs in eager mode?

It's a bug in TF, possibly the one fixed by this commit .

But it's hard to tell without a crash stack trace.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM