I have a jupyter notebook (python) running at google AI platform. in order to read a file into the notebook from google storage i'm using:
blob = storage.blob.Blob(filename,bucket)
content = blob.download_to_filename(filename)
is there a simple way to point to a bucket directory and make reading 5K+ images more easier, efficient and transparent to the pipeline? thanks, N
The easiest way is to use gsutil
command with parallelism:
!gcloud -m cp gs://<your bucket>/* /<your local path>/
Add -r
if the images are also in subdirectory. Here a video
If the download is still slow, look at the number of vCPU that you have for your notebook. The bandwidth is limited to 2Gbps per vCPU up to 8 vCPU.
For increasing again the performance, take care of hotspots. Indeed, if the names of your image are too similar, it's the same shard which serve it and you have contention. Here a video which describe this
However, generally, it's not required to have all the images in your Jupiter Notebook. You have to perform/validate your model on a small set of data before running it on dedicated server and to really train your model.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.