简体   繁体   English

如何将我的训练数据上传到谷歌以进行 Tensorflow 云训练

[英]How to upload my training data into google for Tensorflow cloud training

I want to train my keras model in gcp.我想在 gcp 中训练我的 keras model。

My code:我的代码:

this is how I load the dataset这就是我加载数据集的方式

dataset = pandas.read_csv('USDJPY.fx5.csv', usecols=[2, 3, 4, 5], engine='python')

this is how i trigger cloud training这就是我触发云训练的方式

job_labels = {"job": "forex-usdjpy", "team": "xxx", "user": "xxx"}
tfc.run(requirements_txt="./requirements.txt",
        job_labels=job_labels,
        stream_logs=True
        )

Right before my model, which shouldn't make much of a difference就在我的 model 之前,应该没什么区别

model = Sequential()
model.add(LSTM(4, input_shape=(1, 4)))
model.add(Dropout(0.2))
model.add(Dense(4))
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(trainX, trainY, epochs=1, batch_size=1, verbose=2)

Everything is working, docker image for my model is being created, but the USDJPY.fx5.csv file is not being uploaded.一切正常,正在创建我的 model 的 docker 图像,但未上传USDJPY.fx5.csv文件。 So I get file not found error所以我得到文件未找到错误

What is the proper way of loading custom files into the training job?将自定义文件加载到培训作业中的正确方法是什么? I uploaded the train data to s3 bucket but I wasn't able to tell google to look there.我将火车数据上传到 s3 存储桶,但我无法告诉谷歌去那里看。

Turns out it was a problem with my GCP configuration Here are the steps I made to make it work:原来这是我的 GCP 配置有问题以下是我为使其工作而采取的步骤:

  • Create an s3 bucket and make all files inside it public so the train job can access them创建一个 s3 存储桶并将其中的所有文件公开,以便火车作业可以访问它们

  • Include these two in the requirements fsspec and gcsfs在要求 fsspec 和 gcsfs 中包含这两个

  • remove the 'engine' parameter from panda.readCsv like so像这样从 panda.readCsv 中删除 'engine' 参数

    dataset = pandas.read_csv('gs:///USDJPY.fx5.csv', usecols=[2, 3, 4, 5])数据集 = pandas.read_csv('gs:///USDJPY.fx5.csv', usecols=[2, 3, 4, 5])

Since you are uploading the python file to GCP a good way to organize your code it to put all of the training logic into a method and then called it conditionally on the cloud train flag:由于您将 python 文件上传到 GCP,因此这是一种组织代码的好方法,可以将所有训练逻辑放入一个方法中,然后在云训练标志上有条件地调用它:

if tfc.remote():
    train()

Here is the whole working code if someone is interested如果有人感兴趣,这是整个工作代码

import pandas
import numpy
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers import Dropout
from sklearn.preprocessing import MinMaxScaler
import tensorflow_cloud as tfc
import os

os.environ["PATH"] = os.environ["PATH"] + ":<path to google-cloud-sdk/bin"
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "<path to google credentials json (you can generate this through their UI"


def create_dataset(data):
    dataX = data[0:len(data) - 1]
    dataY = data[1:]
    return numpy.array(dataX), numpy.array(dataY)

def train():
    dataset = pandas.read_csv('gs://<bucket>/USDJPY.fx5.csv', usecols=[2, 3, 4, 5])

    scaler = MinMaxScaler(feature_range=(-1, 1))
    scaler = scaler.fit(dataset)

    dataset = scaler.transform(dataset)

    # split into train and test sets
    train_size = int(len(dataset) * 0.67)
    train, test = dataset[0:train_size], dataset[train_size:len(dataset)]

    trainX, trainY = create_dataset(train)

    trainX = numpy.reshape(trainX, (trainX.shape[0], 1, trainX.shape[1]))

    model = Sequential()
    model.add(LSTM(4, input_shape=(1, 4)))
    model.add(Dropout(0.2))
    model.add(Dense(4))
    model.compile(loss='mean_squared_error', optimizer='adam')
    model.fit(trainX, trainY, epochs=1000, verbose=1)


job_labels = {"job": "forex-usdjpy", "team": "zver", "user": "zver1"}
tfc.run(requirements_txt="./requirements.txt",
        job_labels=job_labels,
        stream_logs=True
        )

if tfc.remote():
    train()

NOTE: This is probably not an optimal LSTM config, take it with a grain of salt注意:这可能不是最佳的 LSTM 配置,请谨慎对待

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM