简体   繁体   English

与本地文件系统相比,Python 代码解压缩文件和写入谷歌云存储的时间太长

[英]Python code is taking too long to decompress file and writing into google cloud storage as compared to local file system

Its quite weird, All I am trying to do is decompress the file and save it .很奇怪,我想做的就是解压缩文件并保存它 file is having文件有

size: 16 Mb
extension = .json.gz
Source location = Google Cloud Storage
Destination location = Google Cloud Storage / Local File System

When I use当我使用

%%time
import gzip
import shutil
import gcsfs
with gcp_file_system.open('somebucket/<file.json.gz>','rb') as fl_:
    with gzip.open(fl_, 'rb') as f_in:        
        with gcp_file_system.open('somebucket/<file.json>','wb') as f_out:
            shutil.copyfileobj(f_in, f_out)

It produces: Wall time: 5min 51s它产生: Wall time: 5min 51s

But when I try the same and change destination to Local machine但是当我尝试相同并将目的地更改为本地机器时

%%time
import gzip
import shutil
import gcsfs
with gcp_file_system.open('somebucket/<file.json.gz>','rb') as fl_:
    with gzip.open(fl_, 'rb') as f_in:        
        with open('localdir/<file.json>','wb') as f_out:
            shutil.copyfileobj(f_in, f_out)

It produces: Wall time: 8.28 s它产生: Wall time: 8.28 s

Am not sure, what is playing role like buf_size,.network speed, some gcsfs backend.我不确定,像 buf_size、.network 速度、一些 gcsfs 后端这样的角色在起作用。

Instead of using a gcsfs file, use the BlobReader class from the GCS client library, for example:不使用 gcsfs 文件,而是使用 GCS 客户端库中的BlobReader class,例如:

Local Destination本地目的地

%%time
import gzip
import shutil
from google.cloud import storage
from google.cloud.storage import fileio 

storage_client = storage.Client()
bucket = storage_client.bucket('my_bucket')
blob = bucket.blob('file.json.gz')
reader = fileio.BlobReader(blob)
f_out = open('localdir/file.json','wb')
gz = gzip.GzipFile(fileobj=reader, mode="rb")
shutil.copyfileobj(gz, f_out)
f_out.close()
gz.close()
reader.close()

GCS Destination: GCS 目的地:

%%time
import gzip
import shutil
from google.cloud import storage
from google.cloud.storage import fileio 

storage_client = storage.Client()
bucket = storage_client.bucket('my_bucket')
blob_in = bucket.blob('file.json.gz')
reader = fileio.BlobReader(blob_in)
blob_out = bucket.blob('file.json')
writer = fileio.BlobWriter(blob_out)
gz = gzip.GzipFile(fileobj=reader, mode="rb")
shutil.copyfileobj(gz, writer)
gz.close()
reader.close()
writer.close()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python - 返回谷歌云存储 - 返回文件 - Python - Return Google Cloud Storage - Return a File 如何将文件上传到 Python 3 上的 Google Cloud Storage? - How to upload a file to Google Cloud Storage on Python 3? 谷歌云:Dataproc 启动浏览器的时间太长 - Google cloud: Dataproc taking too long to start the explorer 谷歌云存储文件系统,Python Package 错误:AttributeError: 'GCSFile' object 没有属性 'gcsfs' - Google Cloud Storage File System, Python Package Error: AttributeError: 'GCSFile' object has no attribute 'gcsfs' Google Cloud Composer 安装依赖项的时间过长 - Google Cloud Composer taking too long to install dependencies 如何从谷歌云存储中下载 Python 或 Linux 中的文件? - How to download a file in Python or Linux from Google Cloud storage? Google Storage // Cloud Function // Python 修改Bucket中的CSV文件 - Google Storage // Cloud Function // Python Modify CSV file in the Bucket Heroku/Dash 应用程序 Python,读取 Google Cloud Storage 上的文件 - Heroku/Dash app Python, reading file on Google Cloud Storage 是否可以使用文件系统而不是云中的实际存储桶用于开发目的(谷歌云平台) - Is it possible to use file system instead of actual Storage bucket in the cloud for development purposes (Google Cloud Platform) 将文件上传到谷歌云存储时出错 - Error uploading file to google cloud storage
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM