简体   繁体   中英

Process a file in Google Cloud storage

I have some very large files (100GB) in GCS that need to be processed to remove invalid characters. Downloading them and processing them and uploading them again takes forever. Does anyone know if it is possible to process them in the Google Cloud Platform eliminating the need for download/upload?

I am familiar with Python and Cloud functions if those are an option.

As John Hanley said in the comments section, there is no compute features on Cloud Storage, so to process it you need to download it.

Once that said, instead of downloading the huge file locally to process it, you can start a Compute Engine VM, download that file, process it with a Python script (since you have stated that you're familiar with Python), and updated the processed file.

It will be probably quicker to download the file on a Compute Engine VM (it depends on the machine type though) than downloading the file on your computer.

Also, for faster downloads of huge files, you can use some gsutil options:

gsutil \
    -o 'GSUtil:parallel_thread_count=1' \
    -o 'GSUtil:sliced_object_download_max_components=16' \
    cp gs://my-bucket/my-huge-file .

And for faster uploads of huge files, you can use parallel composite uploads:

gsutil \
    -o 'GSUtil:parallel_composite_upload_threshold=150M' \
    cp my-huge-file gs://my-bucket

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM