简体   繁体   English

批次的one_hot编码将是不完整的张量流

[英]one_hot encoding for batches will be incomplete tensorflow

As you know tf.one_hot can do the one hot encoding. 如您所知tf.one_hot可以进行一种热编码。 However, when my dataset is very large, I need to do batch trainning. 但是,当我的数据集非常大时,我需要进行批量训练。 In this way, when i use a for loop to loop over all batches, in each iteration, when i do tf.one_hot, the dimension of one hot matrix will be smaller than i expected. 这样,当我使用for循环遍历所有批次时,在每次迭代中,当我执行tf.one_hot时,一个热矩阵的维数将小于我的预期。

For example, for column 'a' we have 47 categories, but in one batch their might be only 20 shown, and when i do one_hot on this batch, it will create a matrix with dimension of rows * 20 instead of a dimension of rows * 47. 例如,对于列“ a”,我们有47个类别,但是在一个批次中它们可能仅显示20个,当我对此批次执行one_hot时,它将创建一个矩阵,该矩阵的行数为* 20,而不是行数* 47。

How to get a dimension of rows * 47 one hot matrix in each batch? 如何获得行的尺寸* 47每批一个热矩阵?

Thank you! 谢谢!

tf.one_hot() takes an argument, depth , as its second, that determines how long the one-hot vector should be. tf.one_hot()使用参数depth作为其第二个参数,该参数确定单热点矢量应持续多长时间。 If you run your operation like this: 如果您这样运行操作:

b = tf.one_hot( a, 47 )

it should give you a last dimension of 47. 最后一个尺寸应为47。

Tough to say without the code, but some people don't hard code the one_hot size, but try to get it from the label tensor, with something like 很难说没有代码,但是有些人不会硬编码one_hot大小,而是尝试从标签张量中获取它,例如

max_class = tf.reduce_max( a )
b = tf.one_hot( a, max_class )

If that is the case in your code, then maybe a batch only went up to class 20. 如果在您的代码中是这种情况,那么可能批次只达到了类20。

Otherwise need to see your code to say something. 否则需要查看您的代码以说些什么。

If TensorFlow is running out of memory, it will stop with an error, won't just silently bite off half of your data. 如果TensorFlow的内存不足,它将停止并显示错误消息,而不仅仅是默默地咬掉一半的数据。 :) :)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM