在GPU上运行的TensorFlow Word2Vec模型

Question

在此TensorFlow示例中，描述了跳过语法Word2Vec模型的训练。 它包含以下代码片段，该片段明确要求使用CPU设备进行计算，即tf.device('/cpu:0') ：

batch_size = 128
embedding_size = 128  # Dimension of the embedding vector.
skip_window = 1  # How many words to consider left and right.
num_skips = 2  # How many times to reuse an input to generate a label.

# We pick a random validation set to sample nearest neighbors. Here we limit the
# validation samples to the words that have a low numeric ID, which by
# construction are also the most frequent. 
valid_size = 16  # Random set of words to evaluate similarity on.
valid_window = 100  # Only pick dev samples in the head of the distribution.
valid_examples = np.array(random.sample(range(valid_window), valid_size))
num_sampled = 64  # Number of negative examples to sample.

graph = tf.Graph()

with graph.as_default(), tf.device('/cpu:0'):
    # Input data.
    train_dataset = tf.placeholder(tf.int32, shape=[batch_size])
    train_labels = tf.placeholder(tf.int32, shape=[batch_size, 1])
    valid_dataset = tf.constant(valid_examples, dtype=tf.int32)

    # Variables.
    embeddings = tf.Variable(
        tf.random_uniform([vocabulary_size, embedding_size], -1.0, 1.0))
    softmax_weights = tf.Variable(
        tf.truncated_normal([vocabulary_size, embedding_size],
                            stddev=1.0 / math.sqrt(embedding_size)))
    softmax_biases = tf.Variable(tf.zeros([vocabulary_size]))

    # Model.
    # Look up embeddings for inputs.
    embed = tf.nn.embedding_lookup(embeddings, train_dataset)

    # Compute the softmax loss, using a sample of the negative labels each time.
    loss = tf.reduce_mean(
        tf.nn.sampled_softmax_loss(weights=softmax_weights,
                                   biases=softmax_biases, inputs=embed,
                                   labels=train_labels, num_sampled=num_sampled,
                                   num_classes=vocabulary_size))

    # Optimizer.
    # Note: The optimizer will optimize the softmax_weights AND the embeddings.
    # This is because the embeddings are defined as a variable quantity and the
    # optimizer's `minimize` method will by default modify all variable quantities 
    # that contribute to the tensor it is passed.
    # See docs on `tf.train.Optimizer.minimize()` for more details.
    optimizer = tf.train.AdagradOptimizer(1.0).minimize(loss)

    # Compute the similarity between minibatch examples and all embeddings.
    # We use the cosine distance:
    norm = tf.sqrt(tf.reduce_sum(tf.square(embeddings), 1, keep_dims=True))
    normalized_embeddings = embeddings / norm
    valid_embeddings = tf.nn.embedding_lookup(normalized_embeddings, valid_dataset)
    similarity = tf.matmul(valid_embeddings, tf.transpose(normalized_embeddings))

尝试切换到GPU时，引发以下异常：

InvalidArgumentError （请参见上面的回溯）：无法为操作“ Variable_2 / Adagrad”分配设备：无法满足显式设备规范“ / device：GPU：0”，因为没有可用的GPU设备支持的内核。

我想知道为什么提供的图形无法在GPU上计算的原因是什么？ 是否由于tf.int32类型而发生？ 还是应该切换到另一个优化器？ 换句话说，有什么方法可以在GPU上处理Word2Vec模型吗？ （无类型转换）。

更新

遵循Akshay Agrawal的建议，这是原始代码的更新片段，可以达到所需的结果：

with graph.as_default(), tf.device('/gpu:0'):
    # Input data.
    train_dataset = tf.placeholder(tf.int32, shape=[batch_size])
    train_labels = tf.placeholder(tf.int32, shape=[batch_size, 1])
    valid_dataset = tf.constant(valid_examples, dtype=tf.int32)

    embeddings = tf.Variable(
        tf.random_uniform([vocabulary_size, embedding_size], -1.0, 1.0))
    softmax_weights = tf.Variable(
        tf.truncated_normal([vocabulary_size, embedding_size],
                            stddev=1.0 / math.sqrt(embedding_size)))
    softmax_biases = tf.Variable(tf.zeros([vocabulary_size]))    
    embed = tf.nn.embedding_lookup(embeddings, train_dataset)

    with tf.device('/cpu:0'):
        loss = tf.reduce_mean(
            tf.nn.sampled_softmax_loss(weights=softmax_weights,
                                       biases=softmax_biases,
                                       inputs=embed,
                                       labels=train_labels,
                                       num_sampled=num_sampled,
                                       num_classes=vocabulary_size))

    optimizer = tf.train.AdamOptimizer(0.001).minimize(loss)

    norm = tf.sqrt(tf.reduce_sum(tf.square(embeddings), 1, keep_dims=True))
    normalized_embeddings = embeddings / norm
    valid_embeddings = tf.nn.embedding_lookup(normalized_embeddings, valid_dataset)
    similarity = tf.matmul(valid_embeddings, tf.transpose(normalized_embeddings))

Answer 1

由于AdagradOptimizer的稀疏应用操作没有GPU内核，因此引发了错误。 触发稀疏应用是因为通过嵌入查找进行区分会导致稀疏渐变。

GradientDescentOptimizer和AdamOptimizer确实支持稀疏应用操作。 如果要切换到这些优化器之一，则很可能会看到另一个错误：tf.nn.sampled_softmax_loss似乎是在创建没有GPU内核的op。 为了解决这个问题，您可以将loss = tf.reduce_mean(...行用with tf.device('/cpu:0'): context换行，尽管这样做会引入cpu-gpu通信开销。

在GPU上运行的TensorFlow Word2Vec模型

问题描述

1 个解决方案

解决方案1
2 已采纳 2017-11-22 18:55:44

在GPU上运行的TensorFlow Word2Vec模型

问题描述

1 个解决方案

解决方案1 2 已采纳 2017-11-22 18:55:44

解决方案1
2 已采纳 2017-11-22 18:55:44