[英]Want to split train and test data gotten from a csv with tensorflow
I wanted to split train and test data of a csv with tensorflow but I didn't find an order like np.loadtxt in tensor and tried to do splits with numpy and convert it to tensor, but I get an error like below: 我想用张量流分割训练和测试csv的数据,但我没有在张量中找到像np.loadtxt那样的命令,并尝试用numpy进行拆分并将其转换为张量,但是我得到如下错误:
TypeError: object of type 'Tensor' has no len()
and here is my code: 这是我的代码:
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
x = tf.convert_to_tensor( np.loadtxt('data.csv', delimiter=','))
y = tf.convert_to_tensor(np.loadtxt('labels.csv', delimiter=','))
x_train, x_test, y_train, y_test = train_test_split(x, y,
test_size=0.25, random_state='')
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape= (426,30,1)),
tf.keras.layers.Dense(126, activation=tf.nn.tanh),
#tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation=tf.nn.tanh)
])
model.compile(optimizer='sgd',
loss='mean_squared_error',
metrics=['accuracy'])
history = model.fit(x_train, y_train, epochs=5 ) #validation_data
= [x_test, y_test])
model.evaluate(x_test, y_test)
t_predicted = model.predict(x_test)
out_predicted = np.argmax(t_predicted, axis=1)
conf_matrix = tf.confusion_matrix(y_test, out_predicted)
with tf.Session():
print('Confusion Matrix: \n\n', tf.Tensor.eval(conf_matrix,
feed_dict=None, session=None))
# summarize history for accuracy
plt.plot(history.history['acc'])
plt.plot(history.history['val_acc'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()
# summarize history for loss
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()
Won't it be simpler to first load the csv file, do the split and then give Tf the result of the split? 首先加载csv文件,拆分然后给Tf分割的结果不是更简单吗?
sklearn.model_selection.train_test_split()
is not meant to work with the Tensor objects you're getting from tf.convert_to_tensor()
. sklearn.model_selection.train_test_split()
不适用于您从tf.convert_to_tensor()
获取的Tensor对象。
Reversing the order made your code work in small test script 颠倒顺序使您的代码在小测试脚本中工作
x = np.loadtxt('data.csv', delimiter=',')
y = np.loadtxt('labels.csv', delimiter=',')
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.25)
x_train = tf.convert_to_tensor(x_train)
x_test = tf.convert_to_tensor(x_test)
y_train = tf.convert_to_tensor(y_train)
y_test = tf.convert_to_tensor(y_test)
The best practice it not to load full data into tensor. 最好不要将完整数据加载到张量中。 If your code is executed on GPU and if you data is huge the tensor might occupy a significant amount of GPU memory result in "Out of Memory" errors. 如果您的代码在GPU上执行,并且如果您的数据很大,则张量可能会占用大量GPU内存,从而导致“内存不足”错误。 The normally used way is 通常使用的方式是
train_test_split
) 将数据拆分为列车和验证批次(使用类似train_test_split
) When dealing with large/huge images, we cannot load all the images into memory. 处理大型/大型图像时,我们无法将所有图像加载到内存中。 Here the normally followed approached is to load a batch of images into a tensor and use it for training/validataion. 这里通常遵循的是将一批图像加载到张量中并将其用于训练/验证。 All the deeplearning frameworks provide mechanisms to load multiple batches in multithreads so that the next train step is not waiting for the next batch to be loaded. 所有的deeplearning框架都提供了在多线程中加载多个批次的机制,以便下一个列车步骤不等待下一批次加载。
If you still want to load the full data into a tensor and split it into train and test tensors then can use the tensorflow method tf.split
. 如果您仍想将完整数据加载到张量中并将其拆分为训练和测试张量,则可以使用张量流方法tf.split
。 https://www.tensorflow.org/api_docs/python/tf/split https://www.tensorflow.org/api_docs/python/tf/split
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.