Tensorflow slower on Local GPU than Colab GPU on LSTM

Question

There are many issues about Colab GPU being slower, but in my case, its the opposite. I tried training an LSTM with tensorflow 2.4.0 on -

My local GPU - NVIDIA 1660Ti (compute capability - 7.5)
Colab GPU - Tesla K80 (compute capability - 3.7)

On colab, it takes around 3.5 mins for a single epoch, whereas it takes 10.5 mins on my gpu. Now, I've seen benchmarks that say that 1660Ti is supposed to be much faster than Tesla K80, so I'm unable to figure out what's causing the issue here. I've tried various versions of nvidia drivers, cudnn and cuda, but there seems to be no difference

Benchmark: http://ai-benchmark.com/ranking.html

Model Description:

Model: "model_2"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_2 (InputLayer)            [(None, None)]       0                                            
__________________________________________________________________________________________________
embedding_1 (Embedding)         (None, None, 50)     1659000     input_2[0][0]                    
__________________________________________________________________________________________________
input_3 (InputLayer)            [(None, 64)]         0                                            
__________________________________________________________________________________________________
input_4 (InputLayer)            [(None, 64)]         0                                            
__________________________________________________________________________________________________
lstm_2 (LSTM)                   (None, None, 64)     29440       embedding_1[1][0]                
                                                                 input_3[0][0]                    
                                                                 input_4[0][0]                    
__________________________________________________________________________________________________
lstm_3 (LSTM)                   [(None, None, 64), ( 33024       lstm_2[1][0]                     
__________________________________________________________________________________________________
dense (Dense)                   (None, None, 33180)  2156700     lstm_3[1][0]                     
==================================================================================================
Total params: 3,878,164
Trainable params: 2,219,164
Non-trainable params: 1,659,000

Does colab have an optimized version of tensorflow? Or does this have to do something with the OS because I'm using Windows. Please help me with this.

Answer 1

From comments

I think the GPU comparison makes no sense, your local GPU is a laptop one, while the Tesla K80 is a datacenter GPU,with very different thermal profiles, and the K80 has more than three times the number of compute cores than the 1660Ti. All benchmarks approximate the workloads one might use, so it is not surprising that the workload in this benchmark is biased to include INT8 for which the K80 is not that good, but training neural networks is done in FP32. In the end the K80 is a much faster GPU than your laptop one.

I suspected as much, but my gpu runs faster for CNNs than Tesla K80 (mnist example), which is the reason I asked this question.

Sure, MNIST CNNs do not really tell you much about performance (too small models), my point is that there is no general statement like GPU A is always faster than GPU B, it all depends on the workload, and you just found a workload where the K80 is considerably faster. And it is not surprising considering it has 3x more compute elements.

( paraphrased from Dr.Snoopy)

Tensorflow slower on Local GPU than Colab GPU on LSTM

Question

1 answers

solution1
0

Tensorflow slower on Local GPU than Colab GPU on LSTM

Question

1 answers

solution1 0

solution1
0