简体   繁体   中英

What is the most effective way to make unzipped copy of a gzipped file on GCS?

We have a lot of gzipped files on our gcs. To speed up out dataflow job we would like to make an unzipped copy of the files, dataflow TextIO isn't that fast with zipped files.

I'm trying to figure out what the most effective way is to make an unzipped copy of the file on gcs.

As a start I thought I would just start to write a simple download program, but I fail to get the same performance as gsutil have.

So accepted answers to this question would be an example of how to make a super, hopefully simple, download of files from gcs, or how to copy and unzip on the fly on gcs.

You could implement an App Engine or Compute Engine application that processes object change notifications from GCS, so it discovers newly uploaded gzip files and reads/writes the corresponding unzipped file into GCS. This would probably be faster than downloading to your corporate network and reuploading (depending on the speed of your Internet connection).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM