简体   繁体   中英

Best way to upload dataset from my PC to Virtual Machine instance on Google Cloud Platform

I have a big dataset (around 50 GB) for a deep learning experiment. I will train my net on a Virtual Machine instance provided from The Google Cloud Platform. So I need to upload my dataset into the virtual machine. I've tried to use the gcloud console with the command:

gcloud compute scp --recurse C:\Users\Lenovo\Desktop\dataset root@instance-1:/home/Lenovo/dataset

It works, but it needs around 50 hours to end.

Is there any way to make this process faster?

I also have a Google Drive repository in which I stored my dataset. It is possible to directly download into my virtual machine from Google Drive? My virtual machine has Ubuntu 18.04 LTS version of operating system.

The time taken will primarily be governed by the slowest link on the network. Let us assume that the network hosting your GCP Virtual Machine (Compute Engine) isn't going to that. It is also likely that your Google Drive is also not the slowest. Chances are high that the path from your local machine where you are running gcloud is going to be the bottleneck. What I would suggest is to login to your VM on GCP and run a download from your data which I understand is on Drive.

There appears to be a number of ways to achieve that.

  1. Run VNC on your GCP machine This would give you a GUI environment accessed from your local PC but presented from GCP. From there you could install Chrome (on GCP) and access your Drive and initiate a download.

  2. Download a Drive access tool An alternative is to install a Drive data access tool. Here is an example I found with a Google search but others may work:

https://www.howtoforge.com/tutorial/how-to-access-google-drive-from-linux-gdrive/

Follow the recipes there and download the Drive files onto your GCP.

If you have need for further big data work, consider placing your data on Google Cloud Storage and then additional options will apply.

See transferring big data sets .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM