简体   繁体   English

在GCS上解压缩gzip文件的最有效方法是什么?

[英]What is the most effective way to make unzipped copy of a gzipped file on GCS?

We have a lot of gzipped files on our gcs. 我们的gcs上有很多压缩文件。 To speed up out dataflow job we would like to make an unzipped copy of the files, dataflow TextIO isn't that fast with zipped files. 为了加快数据流的工作速度,我们想对文件进行解压缩,但压缩文件的数据流TextIO并没有那么快。

I'm trying to figure out what the most effective way is to make an unzipped copy of the file on gcs. 我试图找出最有效的方法是在gcs上制作该文件的解压缩副本。

As a start I thought I would just start to write a simple download program, but I fail to get the same performance as gsutil have. 一开始我以为我会开始编写一个简单的下载程序,但是我无法获得与gsutil相同的性能。

So accepted answers to this question would be an example of how to make a super, hopefully simple, download of files from gcs, or how to copy and unzip on the fly on gcs. 因此,对此问题的公认答案将是如何从gcs超级轻松地下载文件(或希望简单地从gcs下载文件)或如何在gcs上即时复制和解压缩的示例。

You could implement an App Engine or Compute Engine application that processes object change notifications from GCS, so it discovers newly uploaded gzip files and reads/writes the corresponding unzipped file into GCS. 您可以实施一个App Engine或Compute Engine应用程序,该应用程序处理来自GCS的对象更改通知 ,因此它会发现新上传的gzip文件,并将相应的未压缩文件读/写到GCS中。 This would probably be faster than downloading to your corporate network and reuploading (depending on the speed of your Internet connection). 这可能比下载到公司网络并重新上传(取决于Internet连接的速度)要快。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 制作流副本的最有效方法是什么? - What is the most efficient way to make a copy of a stream? 坚持HashMap的最有效方法是什么? - what is the most effective way to persist HashMap? 将File写入ServletOutputStream的最有效方法 - Most effective way to write File to ServletOutputStream 使用Java 7在生产应用程序中没有任何泄漏的情况下读取文件的最有效方法是什么 - What is the most effective way to read a file without any leaks in production app using java 7 从int值创建BigInteger实例的最有效方法是什么? - What is the most effective way to create BigInteger instance from int value? 重构这个简单方法的最有效方法是什么? - What's the most effective way to refactor this simple method? 从服务器加载谷歌地图标记的最有效方法是什么 - What is the most effective way to load google maps markers from server 在hadoop中查找不同列的最有效方法是什么 - What is the most effective way to find distinct of columns in hadoop 在IntelliJ IDEA中创建“新ArrayList”的最快/最有效方法是什么 - What is the fastest/most effective way to create a “new ArrayList” in IntelliJ IDEA 读取大量文件名的最有效方法是什么? - What is the most effective way to read names of a lot of files?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM