简体   繁体   English

将不同大小的输入数据馈入神经网络的好方法? (Tensorflow)

[英]Good way to feed input data of different sizes into neural network? (Tensorflow)

My data looks like this. 我的数据看起来像这样。 They are floats and they are in a big numpy array [700000,3]. 它们是浮点数,它们位于一个大的numpy数组中[700000,3]。 There are no empty fields. 没有空字段。

Label   | Values1   | Values2
1.      | 0.01      | 0.01
1.      | ...       | ...
1.      |
2.      |
2.      |
3.      |
...

The idea is to feed in the set of values1 and values2 and have it identify the label using classification. 这个想法是输入值1和值2的集合,并使用分类来标识标签。

But I don't want to feed the data row by row, but input all values1/2 that belong to label 1 as a set (eg inputting the first 3 rows is supposed to return [1,0,...], inputting the next 2 rows as a set [0,1,...]) 但是我不想逐行输入数据,而是输入属于标签1的所有值1/2作为一个集合(例如,输入前3行应该返回[1,0,...],输入接下来的2行设为[0,1,...])

Is there a non-complex way of feeding the data in this way? 是否有非复杂的方式以这种方式提供数据? (ie feed batch where column label equals 1) (即列标签等于1的饲料批次)

I am currently sorting the data and thinking about using pointers to the start and having loops which check if the next row is equal to the current to find a pointer to the end of the set and get the number of rows of that batch. 我目前正在对数据进行排序,并考虑使用指向开始的指针,并具有循环,该循环检查下一行是否等于当前行,以找到指向集合结尾的指针,并获取该批处理的行数。 But this more or less prevents randomizing input order. 但这或多或少阻止了输入顺序的随机化。

Since you have your data in a numpy array (let's call it data , you can use 由于您的数据位于numpy数组中(我们称其为data ,因此您可以使用

single_digit = data[(data[:,0] == 1.)][: , 1:]

which will compare the zeroth element of each row with the digit ( 1. in this case) and select only the rows having the label 1. . 它将比较每行的第零个元素与数字(在这种情况下为1. ),并仅选择带有标签1.的行。 From these rows, it takes the first and second element, ie Values1 and Values2. 从这些行中,采用第一个和第二个元素,即Values1和Values2。 A working example is below. 下面是一个工作示例。 You can use a for loop to iterate over all labels contained in the data set and construct a numpy array for each label with 您可以使用for循环遍历数据集中包含的所有标签,并为每个标签构造一个numpy数组,

single_digit = data[(data[:,0] == label_of_this_iteration)][: , 1:]

and then feed these arrays to the network. 然后将这些阵列馈送到网络。 Within TensorFlow you can easily feed batches of different length, if you do not specify the first dimension of the corresponding placeholders. 如果您未指定相应占位符的第一个尺寸,则在TensorFlow中可以轻松地喂入不同长度的批次。

import numpy as np
# Generate some data with three columns (label, Values1, Values2)
n = 20
ints = np.random.randint(1,6,(n, 1))
dous = np.random.uniform(size=(n,2))
data = np.hstack((ints, dous))
print(data)

# Extract the second and third columns of all rows having the label 1.0
ones = data[(data[:,0] == 1.)][: , 1:]
print(ones)

Ideally use TFRecords format. 理想情况下使用TFRecords格式。

This approach makes it easier to mix and match data sets and network architectures 这种方法使混合和匹配数据集和网络体系结构变得更加容易

Here is a link for detail on what this json like structure looks like example.proto 这是有关此json类似结构的示例细节的链接。proto

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM