segmentation fault in tensorflow eager execution?

Question

I use the mirrored strategy in my Tensorflow2 code, as described in this tutorial: https://www.tensorflow.org/guide/distributed_training . I have almost the same exact code, and the setup is working well for about 1.5 years now. I regularly put the function call

@tf.function
def distributed_train_step(dist_inputs):

in the eager mode for debugging purposes by simply commenting the @tf.function, worked well until now. This morning when I started the debugger, I got the following error message: Process finished with exit code 139 (interrupted by signal 11: SIGSEGV). When I put the @tf.function in again, everything works well, it's just in the eager mode. I did even reset all my code and restored an old git commit which I know is working perfectly fine. Can someone explain why this error suddenly occurs in eager mode? I'm a bit lost here..

Answer 1

Can someone explain why this error suddenly occurs in eager mode?

It's a bug in TF, possibly the one fixed by this commit .

But it's hard to tell without a crash stack trace.

segmentation fault in tensorflow eager execution?

Question

1 answers

solution1
0 2022-02-27 19:42:46

segmentation fault in tensorflow eager execution?

Question

1 answers

solution1 0 2022-02-27 19:42:46

solution1
0 2022-02-27 19:42:46