简体   繁体   English

Theano GPU计算慢于numpy

[英]Theano GPU calculation slower than numpy

I'm learning to use theano. 我正在学习使用theano。 I want to populate a term-document matrix (a numpy sparse matrix) by calculating binary TF-IDF for each element inside it: 我想通过计算其中每个元素的二进制TF-IDF来填充term-document矩阵(numpy稀疏矩阵):

import theano
import theano.tensor as T
import numpy as np
from time import perf_counter

def tfidf_gpu(appearance_in_documents,num_documents,document_words):
    start = perf_counter()
    APP = T.scalar('APP',dtype='int32')
    N = T.scalar('N',dtype='int32')
    SF = T.scalar('S',dtype='int32')
    F = (T.log(N)-T.log(APP)) / SF
    TFIDF = theano.function([N,APP,SF],F)
    ret = TFIDF(num_documents,appearance_in_documents,document_words)
    end = perf_counter()
    print("\nTFIDF_GPU ",end-start," secs.")
    return ret

def tfidf_cpu(appearance_in_documents,num_documents,document_words):
    start = perf_counter()
    tfidf = (np.log(num_documents)-np.log(appearance_in_documents))/document_words
    end = perf_counter()
    print("TFIDF_CPU ",end-start," secs.\n")
    return tfidf

But the numpy version is much faster than the theano implementation: 但numpy版本比theano实现快得多:

Progress 1/43
TFIDF_GPU  0.05702276699594222  secs.
TFIDF_CPU  1.454801531508565e-05  secs.

Progress 2/43
TFIDF_GPU  0.023830442980397493  secs.
TFIDF_CPU  1.1073017958551645e-05  secs.

Progress 3/43
TFIDF_GPU  0.021920352999586612  secs.
TFIDF_CPU  1.0738993296399713e-05  secs.

Progress 4/43
TFIDF_GPU  0.02303648801171221  secs.
TFIDF_CPU  1.1675001587718725e-05  secs.

Progress 5/43
TFIDF_GPU  0.02359767400776036  secs.
TFIDF_CPU  1.4385004760697484e-05  secs.

....

I've read that this can be due to overhead, that for small operations might kill the performance. 我已经读过这可能是由于开销,小型操作可能会导致性能下降。

Is my code bad or should I avoid using GPU because of the overhead? 我的代码是坏的还是因为开销而应该避免使用GPU?

The thing is that you are compiling your Theano function every time. 问题是你每次都在编译你的Theano功能。 The compilation takes time. 编译需要时间。 Try passing the compiled function like this: 尝试传递编译的函数,如下所示:

def tfidf_gpu(appearance_in_documents,num_documents,document_words,TFIDF):
    start = perf_counter()
    ret = TFIDF(num_documents,appearance_in_documents,document_words)
    end = perf_counter()
    print("\nTFIDF_GPU ",end-start," secs.")
    return ret

APP = T.scalar('APP',dtype='int32')
N = T.scalar('N',dtype='int32')
SF = T.scalar('S',dtype='int32')
F = (T.log(N)-T.log(APP)) / SF
TFIDF = theano.function([N,APP,SF],F)

tfidf_gpu(appearance_in_documents,num_documents,document_words,TFIDF)

Also your TFIDF task is a bandwidth intensive task. 您的TFIDF任务也是带宽密集型任务。 Theano, and GPU in general, is best for computation intensive tasks. 通常,Theano和GPU最适合计算密集型任务。

The current task will considerable overhead taking the data to the GPU and back because in the end you will need to read each element O(1) times. 当前任务将把数据带到GPU并返回相当大的开销,因为最终你需要读取每个元素O(1)次。 But if you want to do more computation it makes sense to use the GPU. 但是如果你想做更多的计算,那么使用GPU是有意义的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM