[英]How do I write a Dataset from generator to replace Dataset from slices in tensorflow for a tabular data set with X_train and y_train
The following works but consumes all my GPU memory as the dataset gets larger.以下工作但随着数据集变大,消耗了我所有的 GPU memory。
tf_train = tf.data.Dataset.from_tensor_slices((X_train, y_train)).shuffle(1000).batch(512, drop_remainder=True).prefetch(1)
I tried various options but am stuck on how to write the generator.我尝试了各种选择,但仍坚持如何编写生成器。
tf_train = tf.data.Dataset.from_generator(generator=my_gen, output_signature=??)
I don't know how to write my_gen nor the output signature syntax.我不知道如何编写 my_gen 或 output 签名语法。
X_train is a dataframe of numerical features and y_train is a df containing a numerical target variable. X_train 是一个 dataframe 的数值特征,y_train 是一个包含数值目标变量的 df。
You could change your generator function to:您可以将生成器 function 更改为:
def generate_sample():
x = list("123456789")
y = list("2345")
while 1:
yield np.array(x).astype(np.float32), (
np.array(y).astype(np.float32),
np.array(y).astype(np.float32),
)
output signature, output签名,
def generate_sample():
x = list("123456789")
y = list("2345")
while 1:
yield np.array(x).astype(np.float32), (
np.array(y).astype(np.float32),
np.array(y).astype(np.float32),
)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.