[英]What is the most effective way to make unzipped copy of a gzipped file on GCS?
We have a lot of gzipped files on our gcs. 我们的gcs上有很多压缩文件。 To speed up out dataflow job we would like to make an unzipped copy of the files, dataflow TextIO
isn't that fast with zipped files. 为了加快数据流的工作速度,我们想对文件进行解压缩,但压缩文件的数据流TextIO
并没有那么快。
I'm trying to figure out what the most effective way is to make an unzipped copy of the file on gcs. 我试图找出最有效的方法是在gcs上制作该文件的解压缩副本。
As a start I thought I would just start to write a simple download program, but I fail to get the same performance as gsutil
have. 一开始我以为我会开始编写一个简单的下载程序,但是我无法获得与gsutil
相同的性能。
So accepted answers to this question would be an example of how to make a super, hopefully simple, download of files from gcs, or how to copy and unzip on the fly on gcs. 因此,对此问题的公认答案将是如何从gcs超级轻松地下载文件(或希望简单地从gcs下载文件)或如何在gcs上即时复制和解压缩的示例。
You could implement an App Engine or Compute Engine application that processes object change notifications from GCS, so it discovers newly uploaded gzip files and reads/writes the corresponding unzipped file into GCS. 您可以实施一个App Engine或Compute Engine应用程序,该应用程序处理来自GCS的对象更改通知 ,因此它会发现新上传的gzip文件,并将相应的未压缩文件读/写到GCS中。 This would probably be faster than downloading to your corporate network and reuploading (depending on the speed of your Internet connection). 这可能比下载到公司网络并重新上传(取决于Internet连接的速度)要快。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.