简体   繁体   English

TensorFlow:张量入队元组抛出TypeError

[英]TensorFlow: Enqueue tuple of tensors throws TypeError

I have a time based dataset which consists of categorical features, real values features, a mask that states whether the given feature was present at the time, and a "Deltas" array that contains the length of time since a value has been present. 我有一个基于时间的数据集,该数据集由分类特征,实值特征,表示该给定特征是否存在的掩码以及一个包含出现值以来的时间长度的“增量”数组组成。

I want to build a queue of tuples of these tensors so that the categorical features can be converted to one hot and so that the data, mask, and deltas can be used in different parts of the model. 我想建立一个由这些张量组成的元组队列,以便可以将分类特征转换为热特征,以便可以在模型的不同部分中使用数据,掩码和增量。 Below is some code that I wrote to do this: 以下是我为此编写的一些代码:

import tensorflow as tf
import threading
import numpy as np

# Function to generate batches of data
def nextBatch(batch_size):
    n_steps = 14
    batch = []
    for _ in range(batch_size):
        # Create tuple of tensors
        ex = (np.random.randint(0,5, (n_steps, 2)),
              np.random.randn(n_steps, 10),
              np.random.randint(0,2, (n_steps, 12)),
              np.random.randint(0,2000, (n_steps, 12)))
        batch.append(ex)

    return batch


# Graph to enqueue data
tf.reset_default_graph()

q = tf.PaddingFIFOQueue(1000,
                        [np.uint16, tf.float32, tf.uint16, tf.uint16],
                        [(None,5), (None,48), (None,53), (None,53)])

def enqueue_op():
    # Stop enqueuing after 11 ops
    i = 0
    while True:
        q.enqueue_many(nextBatch(100))
        i += 1
        if i >11:
            return      

# Start enqueuing
t = threading.Thread(target=enqueue_op)
t.start()

When I run this I get a TypeError: 当我运行它时,我得到一个TypeError:

TypeError: Expected uint16, got array(...) of type 'ndarray' instead. TypeError:预期的uint16,取而代之的是'ndarray'类型的array(...)。

I am not sure what I am doing wrong, is it the dtype definition when I create my queue? 我不确定自己在做什么错,创建队列时是dtype定义吗?

There are a few problems here: 这里有一些问题:

  1. Your thread is calling q.enqueue_many() repeatedly. 您的线程反复调用q.enqueue_many() Despite its (slightly confusing) name, the q.enqueue_many() method does not immediately enqueue data in the queue, but rather it returns a tf.Operation that must be passed to sess.run() to add the tensors in the queue. 尽管使用q.enqueue_many()方法的名称(有点令人困惑),它不会立即将数据放入队列中,而是返回一个tf.Operation ,该操作必须传递给sess.run()才能在队列中添加张量。 The code that runs in the separate thread is creating 10 enqueue-many operations and discarding them, which is probably not what you intended. 在单独的线程中运行的代码将创建10个入队操作并将其丢弃,这可能不是您想要的。

  2. The return value of nextBatch(100) is a list of 100 tuples of 4 arrays. nextBatch(100)的返回值是一个包含4个数组的100个元组的列表。 The q.enqueue_many() method expects a tuple of 4 arrays. q.enqueue_many()方法需要一个由4个数组组成的元组。 If you want to enqueue a list of 100 tuples, you'll need to run a q.enqueue() op 100 times, or stack together the 100 arrays for each tuple component so that you have a single tuple of four arrays. 如果要排队包含100个元组的列表,则需要运行q.enqueue() op 100次, 将每个元组组件的100个数组堆叠在一起,以使您拥有四个数组的单个元组。

  3. The arrays produce in nextBatch() don't match the shapes of the queue components. nextBatch()产生的数组与队列组件的形状不匹配。 Assuming n_steps is the dimension that could be variable (for the purposes of padding), the function should produce arrays of (n_steps, 5) , (n_steps, 48) , (n_steps, 53) , and (n_steps, 53) to match the queue definition. 假设n_steps是可以变化的维度(出于填充目的),该函数应生成(n_steps, 5)(n_steps, 48)(n_steps, 53)(n_steps, 53)以匹配队列定义。

Here's a version of your code that works as I assume you intended: 这是您假设可以正常使用的代码版本:

import tensorflow as tf
import threading
import numpy as np

# Function to generate batches of data                                                                                                                         
def nextBatch(batch_size):
  n_steps = 14
  batch = []
  for _ in range(batch_size):
    # Create tuple of tensors                                                                                                                                  
    ex = (np.random.randint(0,5, (n_steps, 5)),
          np.random.randn(n_steps, 48),
          np.random.randint(0,2, (n_steps, 53)),
          np.random.randint(0,2000, (n_steps, 53)))
    batch.append(ex)
  return batch

q = tf.PaddingFIFOQueue(1000,
                        [tf.uint16, tf.float32, tf.uint16, tf.uint16],
                        [(None, 5), (None, 48), (None, 53), (None, 53)])

# Define a single op for enqueuing a tuple of placeholder tensors.
placeholders = [tf.placeholder(tf.uint16, shape=(None, 5)),
                tf.placeholder(tf.float32, shape=(None, 48)),
                tf.placeholder(tf.uint16, shape=(None, 53)),
                tf.placeholder(tf.uint16, shape=(None, 53))]
enqueue_op = q.enqueue(placeholders)

# Create a session in order to run the enqueue_op.
sess = tf.Session()

def enqueue_thread_fn():
  for i in range(10):
    batch = nextBatch(100)
    for batch_elem in batch:
      # Each call to `sess.run(enqueue_op, ...)` enqueues a single element in
      # the queue.
      sess.run(enqueue_op, feed_dict={placeholders[0]: batch_elem[0],
                                      placeholders[1]: batch_elem[1],
                                      placeholders[2]: batch_elem[2],
                                      placeholders[3]: batch_elem[3]})

# Start enqueuing                                                                                                                                              
t = threading.Thread(target=enqueue_thread_fn)
t.start()
t.join()
sess.close()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM