在Tensorflow Keras分类器中使用数据集时，model.fit（）方法中的“ IndexError：列表索引超出范围”

Question

I'm new in TensorFlow and I'm trying to create a classifier using Keras. 我是TensorFlow的新手，正在尝试使用Keras创建分类器。 My training data is spitted into two files: - one with training examples, each example is a vector of 64 floats - second with labels, each label is an int within range (0,..,SIZE) (SIZE is 100) and it describes a class. 我的训练数据分为两个文件：-一个带有训练示例的示例，每个示例都是64个浮点数的向量-第二个带有标签的示例，每个标签都是一个范围（0，..，SIZE）（SIZE为100）的int描述一个班级。

Both files are quire large and I can't fit them into memory so I've used tf.Dataset. 这两个文件都需要大文件，我无法将它们装入内存，因此我使用了tf.Dataset。 I create two Datasets (one for features and one for labels) and them merge them using tf.data.Dataset.zip(). 我创建了两个数据集（一个用于功能，一个用于标签），然后使用tf.data.Dataset.zip（）将它们合并。 However during training I have "IndexError: list index out of range" error. 但是，在训练过程中，出现“ IndexError：列表索引超出范围”错误。 But when I print input data it looks fine. 但是当我打印输入数据时，它看起来还不错。 This is the code: 这是代码：

model = tf.keras.Sequential()
model.add(layers.Dense(256, activation='relu'))
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(SIZE, activation='softmax'))
opt = tf.keras.optimizers.Adagrad()
model.compile(optimizer=opt, 
              loss='categorical_crossentropy',
              metrics=['accuracy'])

#read data to: data and label
dataset = tf.data.Dataset.zip((data, label))
#iterator = dataset.make_one_shot_iterator()
#next_element = iterator.get_next()
#print(next_element[0])
#print("\n\n")
#print(next_element[1])

model.fit(dataset, epochs=50)

The error message is: 错误消息是：

Epoch 1/50
Traceback (most recent call last):

  File "<ipython-input-129-f200a4503ff9>", line 1, in <module>
    runfile('D:/ai/collab-filter/dan.py', wdir='D:/ai/collab-filter')

  File "C:\Users\DELL\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 786, in runfile
    execfile(filename, namespace)

  File "C:\Users\DELL\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 110, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "D:/ai/myModel.py", line 256, in <module>
    model.fit(dataset, epochs=50)

  File "C:\Users\DELL\Anaconda3\lib\site-packages\tensorflow\python\keras\engine\training.py", line 694, in fit
    initial_epoch=initial_epoch)

  File "C:\Users\DELL\Anaconda3\lib\site-packages\tensorflow\python\keras\engine\training.py", line 1433, in fit_generator
    steps_name='steps_per_epoch')

  File "C:\Users\DELL\Anaconda3\lib\site-packages\tensorflow\python\keras\engine\training_generator.py", line 256, in model_iteration
    batch_size = int(nest.flatten(batch_data)[0].shape[0])

  File "C:\Users\DELL\Anaconda3\lib\site-packages\tensorflow\python\framework\tensor_shape.py", line 868, in __getitem__
    return self._dims[key]

IndexError: list index out of range

When I uncomment printing of the example I get (for me it looks good): 当取消注释示例的注释时（对我而言，它看起来不错）：

(<tf.Tensor: id=281899, shape=(), dtype=float32, numpy=0.0011093982>, <tf.Tensor: id=281900, shape=(), dtype=float32, numpy=0.008290171>, <tf.Tensor: id=281901, shape=(), dtype=float32, numpy=0.010696268>, <tf.Tensor: id=281902, shape=(), dtype=float32, numpy=-0.0068962937>, <tf.Tensor: id=281903, shape=(), dtype=float32, numpy=0.0020472356>, <tf.Tensor: id=281904, shape=(), dtype=float32, numpy=0.0041239075>, <tf.Tensor: id=281905, shape=(), dtype=float32, numpy=-0.0018036675>, <tf.Tensor: id=281906, shape=(), dtype=float32, numpy=-0.007521228>, <tf.Tensor: id=281907, shape=(), dtype=float32, numpy=0.012179799>, <tf.Tensor: id=281908, shape=(), dtype=float32, numpy=-0.008569455>, <tf.Tensor: id=281909, shape=(), dtype=float32, numpy=-0.005547243>, <tf.Tensor: id=281910, shape=(), dtype=float32, numpy=-0.024963537>, <tf.Tensor: id=281911, shape=(), dtype=float32, numpy=-0.0047834134>, <tf.Tensor: id=281912, shape=(), dtype=float32, numpy=-0.0073425>, <tf.Tensor: id=281913, shape=(), dtype=float32, numpy=-0.0049664816>, <tf.Tensor: id=281914, shape=(), dtype=float32, numpy=0.0012769673>, <tf.Tensor: id=281915, shape=(), dtype=float32, numpy=-0.008846987>, <tf.Tensor: id=281916, shape=(), dtype=float32, numpy=0.002845391>, <tf.Tensor: id=281917, shape=(), dtype=float32, numpy=-0.0012304187>, <tf.Tensor: id=281918, shape=(), dtype=float32, numpy=-0.0073605254>, <tf.Tensor: id=281919, shape=(), dtype=float32, numpy=-0.019149099>, <tf.Tensor: id=281920, shape=(), dtype=float32, numpy=0.0053162603>, <tf.Tensor: id=281921, shape=(), dtype=float32, numpy=0.00018294304>, <tf.Tensor: id=281922, shape=(), dtype=float32, numpy=-0.007135446>, <tf.Tensor: id=281923, shape=(), dtype=float32, numpy=0.019139009>, <tf.Tensor: id=281924, shape=(), dtype=float32, numpy=0.0031176396>, <tf.Tensor: id=281925, shape=(), dtype=float32, numpy=0.016997647>, <tf.Tensor: id=281926, shape=(), dtype=float32, numpy=-0.017783713>, <tf.Tensor: id=281927, shape=(), dtype=float32, numpy=-0.0033694915>, <tf.Tensor: id=281928, shape=(), dtype=float32, numpy=0.02030162>, <tf.Tensor: id=281929, shape=(), dtype=float32, numpy=-0.01870913>, <tf.Tensor: id=281930, shape=(), dtype=float32, numpy=-0.0057595233>, <tf.Tensor: id=281931, shape=(), dtype=float32, numpy=0.013816875>, <tf.Tensor: id=281932, shape=(), dtype=float32, numpy=-0.00463876>, <tf.Tensor: id=281933, shape=(), dtype=float32, numpy=-0.023181098>, <tf.Tensor: id=281934, shape=(), dtype=float32, numpy=0.0064159813>, <tf.Tensor: id=281935, shape=(), dtype=float32, numpy=-0.0018356718>, <tf.Tensor: id=281936, shape=(), dtype=float32, numpy=0.014198529>, <tf.Tensor: id=281937, shape=(), dtype=float32, numpy=-0.019970264>, <tf.Tensor: id=281938, shape=(), dtype=float32, numpy=-0.013106668>, <tf.Tensor: id=281939, shape=(), dtype=float32, numpy=0.01739781>, <tf.Tensor: id=281940, shape=(), dtype=float32, numpy=-0.0075084846>, <tf.Tensor: id=281941, shape=(), dtype=float32, numpy=-0.007515852>, <tf.Tensor: id=281942, shape=(), dtype=float32, numpy=0.008860749>, <tf.Tensor: id=281943, shape=(), dtype=float32, numpy=0.011078904>, <tf.Tensor: id=281944, shape=(), dtype=float32, numpy=0.0031385398>, <tf.Tensor: id=281945, shape=(), dtype=float32, numpy=0.00069636817>, <tf.Tensor: id=281946, shape=(), dtype=float32, numpy=0.016473386>, <tf.Tensor: id=281947, shape=(), dtype=float32, numpy=0.010464343>, <tf.Tensor: id=281948, shape=(), dtype=float32, numpy=0.009564337>, <tf.Tensor: id=281949, shape=(), dtype=float32, numpy=-0.00023193806>, <tf.Tensor: id=281950, shape=(), dtype=float32, numpy=-0.0043777116>, <tf.Tensor: id=281951, shape=(), dtype=float32, numpy=0.0033248402>, <tf.Tensor: id=281952, shape=(), dtype=float32, numpy=0.0020942744>, <tf.Tensor: id=281953, shape=(), dtype=float32, numpy=0.00989055>, <tf.Tensor: id=281954, shape=(), dtype=float32, numpy=0.000547247>, <tf.Tensor: id=281955, shape=(), dtype=float32, numpy=-0.0011691392>, <tf.Tensor: id=281956, shape=(), dtype=float32, numpy=-0.033643395>, <tf.Tensor: id=281957, shape=(), dtype=float32, numpy=-0.0014932752>, <tf.Tensor: id=281958, shape=(), dtype=float32, numpy=0.012660088>, <tf.Tensor: id=281959, shape=(), dtype=float32, numpy=0.0124913>, <tf.Tensor: id=281960, shape=(), dtype=float32, numpy=-0.010591994>, <tf.Tensor: id=281961, shape=(), dtype=float32, numpy=-0.030872872>, <tf.Tensor: id=281962, shape=(), dtype=float32, numpy=-0.0014752604>)



tf.Tensor(
[0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0], shape=(100,), dtype=int32)

Answer 1

I was able to solve this issue. 我能够解决这个问题。 Like I said before, I'm new in TensorFlow and I wasn't aware that the result of tf.data.experimental.CsvDataset() (I use it to read my data) can't be used directly to train the model. 就像我之前说过的那样，我是TensorFlow的新手，我并不知道tf.data.experimental.CsvDataset()的结果（我用它来读取数据）不能直接用于训练模型。 The result is in the form of a tuple of 64 Tensors, but now I'm aware that the training example has to have at least 2 dimensions (the first one can be 1), when the first dimension is the batch_size and the second is the size of the example itself. 结果采用64张量的元组形式，但是现在我知道训练示例必须至少具有2个维度（第一个可以为1），而第一个维度为batch_size，第二个维度为示例本身的大小。 In my code the first Tensor (for example <tf.Tensor: id=281899, shape=(), dtype=float32, numpy=0.0011093982> ) was treated as the first training example, so its shape was () which caused the error. 在我的代码中，第一个Tensor（例如<tf.Tensor: id=281899, shape=(), dtype=float32, numpy=0.0011093982> ）被当作第一个训练示例，因此其形状为（），这导致了错误。 When I was trying to fix the problem I changed some things, that caused other error, which led me to the solution. 当我尝试解决问题时，我做了一些更改，导致了其他错误，这使我找到了解决方案。 Each example has to be resized. 每个示例都必须调整大小。 This is my current version of function to read training examples (DATA_SIZE is 64): 这是我当前阅读培训示例的函数的版本（DATA_SIZE为64）：

def expMap(*item) :
   item2 = tf.reshape(item, [1, DATA_SIZE])
   return item2

def generate_trainExp(nameD):
   filename = os.path.join(DATA, "exps{}.txt".format(nameD))
   record_defaults = [tf.float32] * DATA_SIZE
   dataset = tf.data.experimental.CsvDataset(filename, record_defaults, field_delim=' ')
   dataset = dataset.map(expMap)
   return dataset

.....
data = generate_trainExp(filename)

In expMap() the input item can be also transformed to vector of 64 floats before the reshape instruction ( item = tf.convert_to_tensor(item, dtype=tf.float32) ). 在expMap() ，输入项还可以在重塑指令之前转换为64个浮点数的矢量（ item = tf.convert_to_tensor(item, dtype=tf.float32) ）。

在Tensorflow Keras分类器中使用数据集时，model.fit（）方法中的“ IndexError：列表索引超出范围”

问题描述

1 个解决方案

解决方案1
0 已采纳 2019-07-12 13:16:39

在Tensorflow Keras分类器中使用数据集时，model.fit（）方法中的“ IndexError：列表索引超出范围”

问题描述

1 个解决方案

解决方案1 0 已采纳 2019-07-12 13:16:39

解决方案1
0 已采纳 2019-07-12 13:16:39