[英]Keras/Tensorflow training on GCP with TPU
I am trying to train a model on GCP with keras and tensorflow 1.15.我正在尝试使用 keras 和 tensorflow 1.15 在 GCP 上训练 model。 From now my code is similar to what I could do on colab, namely:
从现在开始,我的代码类似于我可以在 colab 上执行的操作,即:
# TPUs
import tensorflow as tf
print(tf.__version__)
cluster_resolver = tf.distribute.cluster_resolver.TPUClusterResolver("tpu-name")
tf.config.experimental_connect_to_cluster(cluster_resolver)
tf.tpu.experimental.initialize_tpu_system(cluster_resolver)
tpu_strategy = tf.distribute.experimental.TPUStrategy(cluster_resolver)
print("Number of accelerators: ", tpu_strategy.num_replicas_in_sync)
import numpy as np
np.random.seed(123) # for reproducibility
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Activation, Flatten
from tensorflow.keras.layers import Convolution2D, MaxPooling2D, Input
from tensorflow.keras import utils
from tensorflow.keras.datasets import mnist, cifar10
from tensorflow.keras.models import Model
# 4. Load data into train and test sets
(X_train, y_train) = load_data(sets="gs://BUCKETS/dogscats/train/",target_size=img_size)
(X_test, y_test) = load_data(sets="gs://BUCKETS/dogscats/valid/",target_size=img_size)
print(X_train.shape, X_test.shape)
# 5. Preprocess input data
#X_train = X_train.reshape(X_train.shape[0], 28, 28, 1)
#X_test = X_test.reshape(X_test.shape[0], 28, 28,1)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255.0
X_test /= 255.0
print(y_train.shape, y_test.shape)
# 6. Preprocess class labels One hot encoding
Y_train = utils.to_categorical(y_train, 2)
Y_test = utils.to_categorical(y_test, 2)
print(Y_train.shape, Y_test.shape)
with tpu_strategy.scope():
model = make_model((img_size, img_size, 3))
# 8. Compile model
model.compile(loss='categorical_crossentropy',
optimizer="sgd",
metrics=['accuracy'])
model.summary()
batch_size = 1250 * tpu_strategy.num_replicas_in_sync
# 9. Fit model on training data
model.fit(X_train, Y_train, steps_per_epoch=len(X_train)//batch_size,
epochs=5, verbose=1)
But my data is on the bucket and my code is on an VM.但是我的数据在存储桶上,而我的代码在虚拟机上。 So what I have to do?
那我该怎么办? I tried to load my data using "gs://BUCKETS" but it does not work.
我尝试使用“gs://BUCKETS”加载我的数据,但它不起作用。 What should I do?
我应该怎么办? EDIT: I add my code to load data, I forgot it sorry.
编辑:我添加了我的代码来加载数据,对不起,我忘记了。
def load_data(sets="dogcats/train/", k = 5000, target_size=250):
# define location of dataset
folder = sets
photos, labels = list(), list()
# determine class
output = 0.0
for i, dog in enumerate(listdir(folder + "dogs/")):
if i >= k:
break
# load image
photo = load_img(folder + "dogs/" +dog, target_size=(target_size, target_size))
# convert to numpy array
photo = img_to_array(photo)
# store
photos.append(photo)
labels.append(output)
output = 1.0
for i, cat in enumerate(listdir(folder + "cats/") ):
if i >= k:
break
# load image
photo = load_img(folder + "cats/"+cat, target_size=(target_size, target_size))
# convert to numpy array
photo = img_to_array(photo)
# store
photos.append(photo)
labels.append(output)
# convert to a numpy arrays
photos = asarray(photos)
labels = asarray(labels)
print(photos.shape, labels.shape)
photos, labels = shuffle(photos, labels, random_state=0)
return photos, labels
EDIT2: To complete the answer of @daudnadeem in case some other people are in the same case. EDIT2:完成@daudnadeem 的答案,以防其他人处于相同情况。
My goal was to get images from a bucket, so the code works well and allowed to get byte object.我的目标是从存储桶中获取图像,因此代码运行良好并允许获取字节 object。 To transform it into image you just need to use PIL library:
要将其转换为图像,您只需使用 PIL 库:
from PIL import Image
from io import BytesIO
import numpy as np
from google.cloud import storage
client = storage.Client()
bucket = client.get_bucket("BUCKETS")
blob = bucket.get_blob('dogscats/train/<you-will-need-to-point-to-a-file-and-not-a-directory>')
data = blob.download_as_string()
img = Image.open(BytesIO(data))
img = np.array(img)
(X_train, y_train) = load_data(sets="gs://BUCKETS/dogscats/train/",target_size=img_size)
(X_test, y_test) = load_data(sets="gs://BUCKETS/dogscats/valid/",target_size=img_size)
This obviously won't work since essentially all you've done is given sets a string.这显然是行不通的,因为基本上你所做的只是给定了一个字符串。 What you need to do is download this data as a string, and then use that.
您需要做的是将此数据下载为字符串,然后使用它。
First install the package pip install google-cloud-storage
or pip3 install google-cloud-storage
首先安装 package
pip install google-cloud-storage
或pip3 install google-cloud-storage
pip -> Python pip -> Python
pip3 -> Python3 pip3 -> Python3
Have a look at this , you will need a service account to interact with GCP from your code.看看这个,您将需要一个服务帐户来从您的代码中与 GCP 进行交互。 For authentication purposes.
用于身份验证。
When you get your service account as a json, you need to do one of two things:当您将服务帐户设为 json 时,您需要执行以下两项操作之一:
Set it as an env variable: export GOOGLE_APPLICATION_CREDENTIALS="/home/user/Downloads/[FILE_NAME].json"
将其设置为环境变量:
export GOOGLE_APPLICATION_CREDENTIALS="/home/user/Downloads/[FILE_NAME].json"
or my preferrable workaround或者我更喜欢的解决方法
gcloud auth activate-service-account \
<repalce-with-email-from-json-file> \
--key-file=<path/to/your/json/file> --project=<name-of-your-gcp-project>
Now lets look at how you can use google-cloud-storage library to download your file as a string:现在让我们看看如何使用 google-cloud-storage 库将文件下载为字符串:
from google.cloud import storage
client = storage.Client()
bucket = client.get_bucket("BUCKETS")
blob = bucket.get_blob('/dogscats/train/<you-will-need-to-point-to-a-file-and-not-a-directory>')
data = blob.download_as_string()
Now that you have your data as a string, you can simply pass data
into load data like so (X_train, y_train) = load_data(sets=data,target_size=img_size)
现在您将数据作为字符串,您可以像这样简单地将
data
传递给加载数据(X_train, y_train) = load_data(sets=data,target_size=img_size)
It sounds complex but heres a quick psuedo layout:这听起来很复杂,但这里有一个快速的伪布局:
load_data(data)
load_data(data)
Hope that helps!希望有帮助!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.