简体   繁体   English

Google Storage // Cloud Function // Python 修改Bucket中的CSV文件

[英]Google Storage // Cloud Function // Python Modify CSV file in the Bucket

thanks for reading.谢谢阅读。

I've some problem with touching csv file in Bucket, i know how i can copy/rename/move file, but i have no idea how to modify file with out downloading to local machine.我在访问 Bucket 中的 csv 文件时遇到一些问题,我知道如何复制/重命名/移动文件,但我不知道如何在不下载到本地机器的情况下修改文件。

Actually i have major idea, its download blob (csv file) as bytes then modify and upload to the Bucket as bytes.实际上我有一个主要想法,它以字节形式下载 blob(csv 文件),然后以字节形式修改并上传到 Bucket。 But i don't understand how to modify bytes.但我不明白如何修改字节。

How i should touch csv: add new header - date, and add value (today.date) in each row of csv我应该如何触摸 csv:添加新的 header - 日期,并在 csv 的每一行中添加值(today.date)

---INPUT--- CSV file in the Bucket: ---INPUT--- 存储桶中的 CSV 文件:

a A b b
1 1个 2 2个

--OUTPUT--- updated CSV file in the Bucket: --OUTPUT--- 更新了 Bucket 中的 CSV 文件:

a A b b date日期
1 1个 2 2个 today今天

my code:我的代码:

def addDataToCsv(bucket,fileName):
    today = str(date.today())

    bucket = storage_client.get_bucket(bucket)
    blob = bucket.blob(fileName)
    fileNameText = blob.download_as_string()
    
    /// This should be a magic bytes modification //

    blobNew = bucket.blob(path+'/'+'mod.csv')
    blobNew.upload_from_string(fileNameText,content_type='text/csv')


Please help, thank you for time and effort请帮忙,谢谢你的时间和精力

If I understand, you want to modify the CSV file in the bucket without downloading it to the local machine file-system.据我了解,您想修改存储桶中的 CSV 文件而不将其下载到本地计算机文件系统。

You cannot directly edit a file from a Cloud Storage Bucket, aside from its metadata, therefore you will need to download it to your local machine somehow and push changes to the bucket.除了元数据之外,您不能直接从 Cloud Storage Bucket 编辑文件,因此您需要以某种方式将其下载到本地计算机并将更改推送到存储桶。

Objects are immutable, which means that an uploaded object cannot change throughout its storage lifetime.对象是不可变的,这意味着上传的 object 在其整个存储生命周期内都无法更改。

However, an approach would be to use Cloud Storage FUSE , which mounts a Cloud Storage bucket as a file system so you can edit any file from there and changes are applied to your bucket.但是,一种方法是使用Cloud Storage FUSE ,它将 Cloud Storage 存储桶作为文件系统安装,这样您就可以从那里编辑任何文件,并将更改应用到您的存储桶。

Still if this is not a suitable solution for you, the bytes can be downloaded and modified as you propose by decoding the bytes object (commonly using UTF-8, although depends on your characters) and reencoding it before uploading it.如果这不是适合您的解决方案,可以按照您的建议通过解码字节 object(通常使用 UTF-8,尽管取决于您的字符)并在上传之前重新编码来下载和修改字节。

# Create an array of every CSV file line
csv_array = fileNameText.decode("utf-8").split("\n")
# Add header
csv_array[0] = csv_array[0] + ",date\n"
# Add the date to each field
for i in range(1,len(csv_array)):
    csv_array[i] = csv_array[i] + "," + today + "\n"
# Reencode from list to bytes to upload
fileNameText = ''.join(csv_array).encode("utf-8")

Take into account that if your local machine has some serious storage or performance limitations, if your CSV is large enough that it might cause problems handling it like above, or just for reference, you could use the compose command .考虑到如果你的本地机器有一些严重的存储或性能限制,如果你的 CSV 足够大,可能会导致像上面那样处理它的问题,或者仅供参考,你可以使用compose 命令 For this you would need to modify the code above so only some sections of the CSV file are edited every time, uploaded, and then joined by gsutil compose in Cloud Storage.为此,您需要修改上面的代码,以便每次只编辑 CSV 文件的某些部分,上传,然后通过gsutil compose在 Cloud Storage 中加入。

Sorry I know I'm not at your shoes, but if I were you I will try to keep things simple.抱歉,我知道我不适合你,但如果我是你,我会尽量让事情变得简单。 In deed most systems work best if they are kept simple and they are easier to maintain and share (KISS principle).实际上,大多数系统如果保持简单并且更易于维护和共享(KISS 原则),则效果最佳。 So given you are using your local machine, I assume you have a generous.network bandwidth and enough disk space and memory. So I will not hesitate to download the file, modify it, and upload it again.所以假设你使用的是本地机器,我假设你有足够的网络带宽和足够的磁盘空间和 memory。所以我会毫不犹豫地下载文件,修改它,然后再次上传。 Even when dealing with big files.即使在处理大文件时。

Then, if your are willing to use another format of the file:然后,如果您愿意使用另一种格式的文件:

download blob (csv file) as bytes以字节形式下载 blob(csv 文件)

In this case a better solution for size and simple code, is to use / convert your file to Parquet or Avro format .在这种情况下,针对大小和简单代码的更好解决方案是使用/将文件转换为 Parquet 或 Avro 格式 These formats will reduce drastically you file size, especially if you add compression.这些格式将大大减少您的文件大小,特别是如果您添加压缩。 Then they allow you to keep a structure for your data, which makes their modifications way simpler.然后它们允许您为数据保留一个结构,这使得它们的修改更加简单。 Finally you have many resources on the.net on how to use these formats with python, and comparisons between CSV, Avro and Parquet.最后,您在 .net 上有很多关于如何将这些格式与 python 一起使用的资源,以及 CSV、Avro 和 Parquet 之间的比较。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何将 BigQuery 视图作为 csv 文件传输到 Google Cloud Storage 存储桶 - How to Transfer a BigQuery view to a Google Cloud Storage bucket as a csv file 将文件从 /tmp 文件夹移动到 Google Cloud Storage 存储桶 - Move file from /tmp folder to Google Cloud Storage bucket 无法在浏览器中从 JS 将文件 PUT 到谷歌云存储桶 - Can not PUT file to google cloud storage bucket from JS in browser 如何将 dataframe 上传到 Python 3 上的 Google Cloud Storage(bucket)? - How to upload a dataframe to Google Cloud Storage(bucket) on Python 3? Python3 中的 Cloud Function - 从 Google Cloud Bucket 复制到另一个 Google Cloud Bucket - Cloud Function in Python3 - copy from Google Cloud Bucket to another Google Cloud Bucket 谷歌云数据流(Python):function 读取和写入 a.csv 文件? - Google Cloud Dataflow (Python): function to read from and write to a .csv file? 如何将 Google Cloud Storage 中的文件从一个存储桶移动到另一个存储桶 Python - How to move files in Google Cloud Storage from one bucket to another bucket by Python Python - 返回谷歌云存储 - 返回文件 - Python - Return Google Cloud Storage - Return a File 从 Firebase 存储桶下载 CSV 文件 - Python - Download CSV file from Firebase Bucket Storage - Python Google Cloud Storage 对存储桶中的对象进行分页 (PHP) - Google Cloud Storage paginate objects in a bucket (PHP)
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM