简体繁体中英

Cloud Storage Buckets for PyTorch

原文 2018-08-01 17:32:15 8 1 google-cloud-platform/ deep-learning/ google-cloud-storage/ pytorch

For a particular task I'm working on I have a dataset that is about 25 GB. I'm still experimenting with several methods of preprocessing and definitely don't have my data to it's final form yet. I'm not sure what the common workflow is for this sort of problem, so here is what I'm thinking:

Copy dataset from bucket storage to Compute Engine machine SSD (maybe use around 50 GB SSD) using gcsfuse.
Apply various preprocessing operations as an experiment.
Run training with PyTorch on the data stored on the local disk (SSD)
Copy newly processed data back to storage bucket with gcsfuse if it was successful.
Upload results and delete the persistent disk that was used during training.

The alternative approach is this:

Run the processing operations on the data within the Cloud Bucket itself using the mounted directory with gcsfuse
Run training with PyTorch directly on the mounted gcsfuse Bucket directory, using a compute engine instance with very limited storage.
Upload Results and Delete Compute Engine Instance.

Which of these approaches is suggested? Which will incur fewer charges and is used most often when running these kind of operations. Is there a different workflow that I'm not seeing here?

1 answers

On the billing side, the charges would be the same, as the fuse operations are charged like any other Cloud Storage interface according to the documentation . In your use case I don't know how you are going to train the data, but if you do more than one operation to files it would be better to have them downloaded, trained locally and then the final result uploaded, which would be 2 object operations. If you do, for example, more than one change or read to a file during the training, every operation would be an object operation. On the workflow side, the proposed one looks good to me.

Google Cloud Storage can't retrieve buckets or contents of buckets

Scaling Authorization for Google Cloud Storage Buckets

Setting up logging for buckets in Google Cloud Storage

Google Cloud App Engine - Default Storage Buckets

Does not have storage.buckets.list access to the Google Cloud project?

How to import buckets of Google cloud storage in firebase automatically?

Google Cloud Storage: Images Wont Copy To Other Buckets

Is there a cost for gcp cloud functions to access gcp storage buckets

Copy between two Google Cloud Storage buckets using Java

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Google Cloud Storage can't retrieve buckets or contents of buckets Scaling Authorization for Google Cloud Storage Buckets Setting up logging for buckets in Google Cloud Storage Google Cloud App Engine - Default Storage Buckets Does not have storage.buckets.list access to the Google Cloud project? How to import buckets of Google cloud storage in firebase automatically? Google Cloud Storage: Images Wont Copy To Other Buckets Recommended approach to sync Google Cloud Storage buckets cross continents Is there a cost for gcp cloud functions to access gcp storage buckets Copy between two Google Cloud Storage buckets using Java

Related Tags

Cloud Storage Buckets for PyTorch

Question

1 answers

solution1 1 ACCPTED 2018-08-02 11:01:33

solution1
1 ACCPTED 2018-08-02 11:01:33