繁体   English   中英

云 Function 发送 CSV 到云存储

[英]Cloud Function Sending CSV to Cloud Storage

我有一个云 function,用于从 API 调用创建一个 CSV,然后将该 CSV 发送到 Cloud Storage。

这是我的代码:

import requests
import pprint
import pandas as pd
from flatsplode import flatsplode
import csv
import datetime
import schedule
import time
import json
import numpy as np
import os
import tempfile
from google.cloud import storage

api_url = 'https://[YOUR_DOMAIN].com/api/v2/[API_KEY]/keywords/list?site_id=[SITE_ID][&start={start}][&results=100]&format=json'

def export_data(url):
    response = requests.get(url)  # Make a GET request to the URL
    payload = response.json() # Parse `response.text` into JSON
    pp = pprint.PrettyPrinter(indent=1)

    # Use the flatsplode package to quickly turn the JSON response to a DF
    new_list = pd.DataFrame(list(flatsplode(payload)))

    # Drop certain columns from the DF
    idx = np.r_[1:5,14:27,34,35]
    new_list = new_list.drop(new_list.columns[idx], axis=1)

    # Create a csv and load it to google cloud storage
    new_list = new_list.to_csv('/tmp/temp.csv')
    def upload_blob(bucket_name, source_file_name, destination_blob_name):

        storage_client = storage.Client()
        bucket = storage_client.get_bucket(bucket_name)
        blob = bucket.blob(destination_blob_name)
        blob.upload_from_file(source_file_name)

    message = "Data for CSV file"    # ERROR HERE
    csv = open(new_list, "w")
    csv.write(message)
    with open(new_list, 'r') as file_obj:
        upload_blob('data-exports', file_obj, 'data-' + str(datetime.date.today()) + '.csv')

export_data(api_url)

我试图将文件设为/tmp格式以允许我将其写入存储,但没有取得太大成功。 API 电话非常有效,我可以在本地拨打 CSV。 上传到云存储是我收到错误的地方。

任何帮助深表感谢!

不要尝试在您的云函数中使用临时存储,而是尝试将您的 dataframe 转换为字符串并将结果上传到 Google Cloud Storage。

考虑例如:

import requests
import pprint
import pandas as pd
from flatsplode import flatsplode
import csv
import datetime
import schedule
import time
import json
import numpy as np
import os
import tempfile
from google.cloud import storage

api_url = 'https://[YOUR_DOMAIN].com/api/v2/[API_KEY]/keywords/list?site_id=[SITE_ID][&start={start}][&results=100]&format=json'

def export_data(url):
    response = requests.get(url)  # Make a GET request to the URL
    payload = response.json() # Parse `response.text` into JSON
    pp = pprint.PrettyPrinter(indent=1)

    # Use the flatsplode package to quickly turn the JSON response to a DF
    new_list = pd.DataFrame(list(flatsplode(payload)))

    # Drop certain columns from the DF
    idx = np.r_[1:5,14:27,34,35]
    new_list = new_list.drop(new_list.columns[idx], axis=1)

    # Convert your df to str: it is straightforward, just do not provide
    # any value for the first param path_or_buf
    csv_str = new_list.to_csv()

    # Then, upload it to cloud storage
    def upload_blob(bucket_name, data, destination_blob_name):

        storage_client = storage.Client()
        bucket = storage_client.get_bucket(bucket_name)
        blob = bucket.blob(destination_blob_name)
        # Note the use of upload_from_string here. Please, provide
        # the appropriate content type if you wish
        blob.upload_from_string(data, content_type='text/csv')

    upload_blob('data-exports', csv_str, 'data-' + str(datetime.date.today()) + '.csv')

export_data(api_url)

据我所知,你这里有几个问题。

首先,如果提供文件路径或缓冲区作为参数, pd.to_csv不会返回任何内容。 所以这一行写入文件,但也将值None分配给new_list

new_list = new_list.to_csv('/tmp/temp.csv')

要解决此问题,只需删除分配 - 您只需要new_list.to_csv('/tmp/tmp.csv')行。

第一个错误导致了以后的问题,因为您无法将 CSV 写入位置None 相反,提供一个字符串作为open的参数。 此外,如果您使用打开模式'w' ,则 CSV 数据将被覆盖。 你在这里要的格式是什么? 您的意思是将 append 添加到文件中,并带有'a'吗?

message = "Data for CSV file"    # ERROR HERE
csv = open(new_list, "w")
csv.write(message)

最后,您提供了一个文件 object,其中需要一个字符串,这次是upload_blob函数的source_file_name参数。


    with open(new_list, 'r') as file_obj:
        upload_blob('data-exports', file_obj, 'data-' + str(datetime.date.today()) + '.csv')

我认为在这里您可以跳过文件打开,只需将文件路径作为第二个参数传递。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM