简体   繁体   English

想要从具有张量流的csv中分离出列车和测试数据

[英]Want to split train and test data gotten from a csv with tensorflow

I wanted to split train and test data of a csv with tensorflow but I didn't find an order like np.loadtxt in tensor and tried to do splits with numpy and convert it to tensor, but I get an error like below: 我想用张量流分割训练和测试csv的数据,但我没有在张量中找到像np.loadtxt那样的命令,并尝试用numpy进行拆分并将其转换为张量,但是我得到如下错误:

      TypeError: object of type 'Tensor' has no len()

and here is my code: 这是我的代码:

     import tensorflow as tf
     import numpy as np
     import matplotlib.pyplot as plt
     from sklearn.model_selection import train_test_split

     x = tf.convert_to_tensor( np.loadtxt('data.csv', delimiter=','))
     y = tf.convert_to_tensor(np.loadtxt('labels.csv', delimiter=','))

     x_train, x_test, y_train, y_test = train_test_split(x, y, 
     test_size=0.25, random_state='')

     model = tf.keras.models.Sequential([
     tf.keras.layers.Flatten(input_shape= (426,30,1)),
     tf.keras.layers.Dense(126, activation=tf.nn.tanh),
      #tf.keras.layers.Dropout(0.2),
      tf.keras.layers.Dense(10, activation=tf.nn.tanh)
       ])

       model.compile(optimizer='sgd',
          loss='mean_squared_error',
          metrics=['accuracy'])

       history = model.fit(x_train, y_train, epochs=5 )  #validation_data 
       = [x_test, y_test])
       model.evaluate(x_test, y_test)

      t_predicted = model.predict(x_test)
      out_predicted = np.argmax(t_predicted, axis=1)
      conf_matrix = tf.confusion_matrix(y_test, out_predicted)
      with tf.Session():
       print('Confusion Matrix: \n\n', tf.Tensor.eval(conf_matrix, 
     feed_dict=None, session=None))

      # summarize history for accuracy
      plt.plot(history.history['acc'])
      plt.plot(history.history['val_acc'])
      plt.title('model accuracy')
      plt.ylabel('accuracy')
      plt.xlabel('epoch')
      plt.legend(['train', 'test'], loc='upper left')
      plt.show()

     # summarize history for loss
     plt.plot(history.history['loss'])
     plt.plot(history.history['val_loss'])
     plt.title('model loss')
     plt.ylabel('loss')
     plt.xlabel('epoch')
     plt.legend(['train', 'test'], loc='upper left')
     plt.show()

Won't it be simpler to first load the csv file, do the split and then give Tf the result of the split? 首先加载csv文件,拆分然后给Tf分割的结果不是更简单吗?

sklearn.model_selection.train_test_split() is not meant to work with the Tensor objects you're getting from tf.convert_to_tensor() . sklearn.model_selection.train_test_split()不适用于您从tf.convert_to_tensor()获取的Tensor对象。

Reversing the order made your code work in small test script 颠倒顺序使您的代码在小测试脚本中工作

x = np.loadtxt('data.csv', delimiter=',')
y = np.loadtxt('labels.csv', delimiter=',')

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.25)

x_train = tf.convert_to_tensor(x_train)
x_test = tf.convert_to_tensor(x_test)
y_train = tf.convert_to_tensor(y_train)
y_test = tf.convert_to_tensor(y_test)

The best practice it not to load full data into tensor. 最好不要将完整数据加载到张量中。 If your code is executed on GPU and if you data is huge the tensor might occupy a significant amount of GPU memory result in "Out of Memory" errors. 如果您的代码在GPU上执行,并且如果您的数据很大,则张量可能会占用大量GPU内存,从而导致“内存不足”错误。 The normally used way is 通常使用的方式是

  • Load the data into a RAM variable (usually numpy), 将数据加载到RAM变量(通常是numpy),
  • Split the data into train and validation batches (using something like train_test_split ) 将数据拆分为列车和验证批次(使用类似train_test_split
  • Iterate over splits with batch size and creating a tensor of batch size. 使用批量大小迭代拆分并创建批量大小的张量。 Use it to train (validation split batch for validation) 用它来训练(验证分批用于验证)

When dealing with large/huge images, we cannot load all the images into memory. 处理大型/大型图像时,我们无法将所有图像加载到内存中。 Here the normally followed approached is to load a batch of images into a tensor and use it for training/validataion. 这里通常遵循的是将一批图像加载到张量中并将其用于训练/验证。 All the deeplearning frameworks provide mechanisms to load multiple batches in multithreads so that the next train step is not waiting for the next batch to be loaded. 所有的deeplearning框架都提供了在多线程中加载多个批次的机制,以便下一个列车步骤不等待下一批次加载。

If you still want to load the full data into a tensor and split it into train and test tensors then can use the tensorflow method tf.split . 如果您仍想将完整数据加载到张量中并将其拆分为训练和测试张量,则可以使用张量流方法tf.split https://www.tensorflow.org/api_docs/python/tf/split https://www.tensorflow.org/api_docs/python/tf/split

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM