简体   繁体   English

Tensorflow GPU /多GPU如何分配内存?

[英]How Tensorflow GPU/multi-GPU allocates memory?

I have two questions: 我有两个问题:

(1) How does Tensorflow allocate GPU memory when using only one GPU? (1)Tensorflow在仅使用一个GPU时如何分配GPU内存? I have an implementation of convolution 2d like this (globally using GPU): 我有一个像这样的卷积2d的实现(全局使用GPU):

def _conv(self, name, x, filter_size, in_filters, out_filters, strides):
    with tf.variable_scope(name):
        n = filter_size * filter_size * out_filters
        kernel = tf.get_variable(
            '', [filter_size, filter_size, in_filters, out_filters], tf.float32,
            initializer=tf.random_normal_initializer(stddev=np.sqrt(2.0 / n)),
        )
        return tf.nn.conv2d(x, kernel, strides, padding='SAME')
        # another option
        # x = tf.nn.conv2d(x, kernel, strides, padding='SAME')
        # return x

The another option in the comments does the same operation but have added a new variable x . 注释中的另一个选项执行相同的操作但添加了一个新变量x In this case, will TF allocate more GPU memory? 在这种情况下,TF会分配更多GPU内存吗?

(2) when using multiple GPUs. (2)使用多个GPU时。 I'd like to use list for gathering the results from multiple GPUs. 我想使用list来收集来自多个GPU的结果。 The implementation is below: 实施如下:

def _conv(self, name, input, filter_size, in_filters, out_filters, strides, trainable=True):
    assert type(input) is list
    assert len(input) == FLAGS.gpu_num

    n = filter_size * filter_size * out_filters
    output = []
    for i in range(len(input)):
        with tf.device('/gpu:%d' % i):
            with tf.variable_scope(name, reuse=i > 0):
                kernel = tf.get_variable(
                    '', [filter_size, filter_size, in_filters, out_filters], tf.float32,
                    initializer=tf.random_normal_initializer(stddev=np.sqrt(2.0 / n))
                )
                output.append(tf.nn.conv2d(input[i], kernel, strides, padding='SAME'))

    return output

Will TF allocate more memory because of the usage of list ? 由于list的使用,TF会分配更多内存吗? Is output (the list ) attached to some GPU device? outputlist )是否附加到某些GPU设备? I have these kinds of questions because when I am using two GPUs to train the CNNs with this implementation, the program uses much more GPU memory than when using one GPU. 我有这些问题,因为当我使用两个GPU来训练CNN时,该程序使用的GPU内存比使用一个GPU时要多得多。 I think there is something I missed or misunderstood. 我认为有些事我错过了或被误解了。

Using this code to check each tensor and the attached device. 使用此代码检查每个张量和连接的设备。

for n in tf.get_default_graph().as_graph_def().node:
    print n.name, n.device

So the answers for these two questions: 那么这两个问题的答案是:

(1) No. (1)不。

(2) If I'd like to gather the immediate data across GPUs, and the data are considered to compute the gradients, there would be problems. (2)如果我想在GPU上收集直接数据,并且数据被认为是计算梯度,那么就会出现问题。 Because computing gradients consumes memory too. 因为计算梯度也会消耗内存。 When accessing data across GPUs, additional memory will be allocated. 通过GPU访问数据时,将分配额外的内存。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM