[英]Python code is taking too long to decompress file and writing into google cloud storage as compared to local file system
Its quite weird, All I am trying to do is decompress the file and save it .很奇怪,我想做的就是解压缩文件并保存它。 file is having
文件有
size: 16 Mb
extension = .json.gz
Source location = Google Cloud Storage
Destination location = Google Cloud Storage / Local File System
When I use当我使用
%%time
import gzip
import shutil
import gcsfs
with gcp_file_system.open('somebucket/<file.json.gz>','rb') as fl_:
with gzip.open(fl_, 'rb') as f_in:
with gcp_file_system.open('somebucket/<file.json>','wb') as f_out:
shutil.copyfileobj(f_in, f_out)
It produces: Wall time: 5min 51s
它产生:
Wall time: 5min 51s
But when I try the same and change destination to Local machine但是当我尝试相同并将目的地更改为本地机器时
%%time
import gzip
import shutil
import gcsfs
with gcp_file_system.open('somebucket/<file.json.gz>','rb') as fl_:
with gzip.open(fl_, 'rb') as f_in:
with open('localdir/<file.json>','wb') as f_out:
shutil.copyfileobj(f_in, f_out)
It produces: Wall time: 8.28 s
它产生:
Wall time: 8.28 s
Am not sure, what is playing role like buf_size,.network speed, some gcsfs backend.我不确定,像 buf_size、.network 速度、一些 gcsfs 后端这样的角色在起作用。
Instead of using a gcsfs file, use the BlobReader
class from the GCS client library, for example:不使用 gcsfs 文件,而是使用 GCS 客户端库中的
BlobReader
class,例如:
Local Destination本地目的地
%%time
import gzip
import shutil
from google.cloud import storage
from google.cloud.storage import fileio
storage_client = storage.Client()
bucket = storage_client.bucket('my_bucket')
blob = bucket.blob('file.json.gz')
reader = fileio.BlobReader(blob)
f_out = open('localdir/file.json','wb')
gz = gzip.GzipFile(fileobj=reader, mode="rb")
shutil.copyfileobj(gz, f_out)
f_out.close()
gz.close()
reader.close()
GCS Destination: GCS 目的地:
%%time
import gzip
import shutil
from google.cloud import storage
from google.cloud.storage import fileio
storage_client = storage.Client()
bucket = storage_client.bucket('my_bucket')
blob_in = bucket.blob('file.json.gz')
reader = fileio.BlobReader(blob_in)
blob_out = bucket.blob('file.json')
writer = fileio.BlobWriter(blob_out)
gz = gzip.GzipFile(fileobj=reader, mode="rb")
shutil.copyfileobj(gz, writer)
gz.close()
reader.close()
writer.close()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.