简体   繁体   中英

Tensorflow multi-threading inference slower than single-threaded inference

I'm attempting to implement multi-threading inference with 3 Tensorflow sessions (using 3 threads) like below:

def test_tf(sess, t_num, y_op, x_inp, input_list, tflag_op):
    sess.run(y_op, {x_inp: input_list, tflag_op: False})

for i, each_sess in enumerate(cr_sessions):            
    t = threading.Thread(target=test_tf, args=(each_sess,i, y_op,
                                                         x_inp, input_list, tflag_op))
    threads_list.append(t)
    t.start()

    for t in threads_list:
        t.join()

I timed the duration of each thread and they came out to be like this:

Thread 0 duration: 0.478595900000073

Thread 1 duration: 0.4760909999999967

Thread 2 duration: 0.47291089999998803

Total duration of 3 threads: 0.4847196000000622

I then compared it with just running inference sequentially (with the below times):

Iteration 0 duration: 0.1481448999998065

Iteration 1 duration: 0.1493705999996564

Iteration 2 duration: 0.14735560000008263

Iteration total duration: 0.44588549999980387

It seems to me that my multi-threading inference isn't actually running the inference in parallel. It seems that it's just completing most of the inference, then waiting until all others are done before completing them one right after another. I understand that in most cases for Python, it's because the GIL is not released by each thread, but I've read that Tensorflow session.run() does release the GIL (so it can be run in parallel).

Is anyone able to let me know how my interpretation of this is? What am I missing to actually make the threads run in parallel (assuming they're not)?

The issue is likely due to the overhead for creating the threads and waiting for them to finish. To test this, I would suggest increasing the amount of data put through the operation to increase the duration of each thread to see whether running in parallel starts to become faster than sequential for a big enough set of data.

As a side note, it might be faster for you to group your data together and run it through a single session instead as the graph operations are already parallelized inside of Tensorflow. In my experience using threading and Tensorflow does not produce the performance expected.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM