簡體   English   中英

雲 Function 發送 CSV 到雲存儲

[英]Cloud Function Sending CSV to Cloud Storage

我有一個雲 function,用於從 API 調用創建一個 CSV,然后將該 CSV 發送到 Cloud Storage。

這是我的代碼:

import requests
import pprint
import pandas as pd
from flatsplode import flatsplode
import csv
import datetime
import schedule
import time
import json
import numpy as np
import os
import tempfile
from google.cloud import storage

api_url = 'https://[YOUR_DOMAIN].com/api/v2/[API_KEY]/keywords/list?site_id=[SITE_ID][&start={start}][&results=100]&format=json'

def export_data(url):
    response = requests.get(url)  # Make a GET request to the URL
    payload = response.json() # Parse `response.text` into JSON
    pp = pprint.PrettyPrinter(indent=1)

    # Use the flatsplode package to quickly turn the JSON response to a DF
    new_list = pd.DataFrame(list(flatsplode(payload)))

    # Drop certain columns from the DF
    idx = np.r_[1:5,14:27,34,35]
    new_list = new_list.drop(new_list.columns[idx], axis=1)

    # Create a csv and load it to google cloud storage
    new_list = new_list.to_csv('/tmp/temp.csv')
    def upload_blob(bucket_name, source_file_name, destination_blob_name):

        storage_client = storage.Client()
        bucket = storage_client.get_bucket(bucket_name)
        blob = bucket.blob(destination_blob_name)
        blob.upload_from_file(source_file_name)

    message = "Data for CSV file"    # ERROR HERE
    csv = open(new_list, "w")
    csv.write(message)
    with open(new_list, 'r') as file_obj:
        upload_blob('data-exports', file_obj, 'data-' + str(datetime.date.today()) + '.csv')

export_data(api_url)

我試圖將文件設為/tmp格式以允許我將其寫入存儲,但沒有取得太大成功。 API 電話非常有效,我可以在本地撥打 CSV。 上傳到雲存儲是我收到錯誤的地方。

任何幫助深表感謝!

不要嘗試在您的雲函數中使用臨時存儲,而是嘗試將您的 dataframe 轉換為字符串並將結果上傳到 Google Cloud Storage。

考慮例如:

import requests
import pprint
import pandas as pd
from flatsplode import flatsplode
import csv
import datetime
import schedule
import time
import json
import numpy as np
import os
import tempfile
from google.cloud import storage

api_url = 'https://[YOUR_DOMAIN].com/api/v2/[API_KEY]/keywords/list?site_id=[SITE_ID][&start={start}][&results=100]&format=json'

def export_data(url):
    response = requests.get(url)  # Make a GET request to the URL
    payload = response.json() # Parse `response.text` into JSON
    pp = pprint.PrettyPrinter(indent=1)

    # Use the flatsplode package to quickly turn the JSON response to a DF
    new_list = pd.DataFrame(list(flatsplode(payload)))

    # Drop certain columns from the DF
    idx = np.r_[1:5,14:27,34,35]
    new_list = new_list.drop(new_list.columns[idx], axis=1)

    # Convert your df to str: it is straightforward, just do not provide
    # any value for the first param path_or_buf
    csv_str = new_list.to_csv()

    # Then, upload it to cloud storage
    def upload_blob(bucket_name, data, destination_blob_name):

        storage_client = storage.Client()
        bucket = storage_client.get_bucket(bucket_name)
        blob = bucket.blob(destination_blob_name)
        # Note the use of upload_from_string here. Please, provide
        # the appropriate content type if you wish
        blob.upload_from_string(data, content_type='text/csv')

    upload_blob('data-exports', csv_str, 'data-' + str(datetime.date.today()) + '.csv')

export_data(api_url)

據我所知,你這里有幾個問題。

首先,如果提供文件路徑或緩沖區作為參數, pd.to_csv不會返回任何內容。 所以這一行寫入文件,但也將值None分配給new_list

new_list = new_list.to_csv('/tmp/temp.csv')

要解決此問題,只需刪除分配 - 您只需要new_list.to_csv('/tmp/tmp.csv')行。

第一個錯誤導致了以后的問題,因為您無法將 CSV 寫入位置None 相反,提供一個字符串作為open的參數。 此外,如果您使用打開模式'w' ,則 CSV 數據將被覆蓋。 你在這里要的格式是什么? 您的意思是將 append 添加到文件中,並帶有'a'嗎?

message = "Data for CSV file"    # ERROR HERE
csv = open(new_list, "w")
csv.write(message)

最后,您提供了一個文件 object,其中需要一個字符串,這次是upload_blob函數的source_file_name參數。


    with open(new_list, 'r') as file_obj:
        upload_blob('data-exports', file_obj, 'data-' + str(datetime.date.today()) + '.csv')

我認為在這里您可以跳過文件打開,只需將文件路徑作為第二個參數傳遞。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM