简体   繁体   English

从 Cloud Functions 到 BigQuery 的 pandas df - 仅 PARQUET 和 CSV?

[英]pandas df from Cloud Functions to BigQuery - only PARQUET and CSV?

I'm querying an API with GCP Cloud Functions and would like to write the result to BigQuery.我正在使用 GCP Cloud Functions 查询 API,并希望将结果写入 BigQuery。 I'm getting this error:我收到此错误:

Got unexpected source_format: 'NEWLINE_DELIMITED_JSON'.得到意外的 source_format:'NEWLINE_DELIMITED_JSON'。 Currently, only PARQUET and CSV are supported目前仅支持 PARQUET 和 CSV

This is my code这是我的代码

from google.cloud import bigquery
import pandas as pd
import requests
import datetime

def hello_pubsub(event, context):
 
 response = requests.get("https://api.openweathermap.org/data/2.5/weather?q=berlin&appid=12345&units=metric&lang=de")
 responseJson = response.json()
 
 # Creates DataFrame
 df = pd.DataFrame({'datetime':pd.to_datetime(format(datetime.datetime.now())),
               'name':str(responseJson['name']),
               'temp':float(responseJson['main']['temp']),
               'windspeed':float(responseJson['wind']['speed']),
               'winddeg':int(responseJson['wind']['deg'])
               }, index=[0])  

 project_id = 'myproj'
 client = bigquery.Client(project=project_id)
 dataset_id = 'weather'

 dataset_ref = client.dataset(dataset_id)
 job_config = bigquery.LoadJobConfig()
 job_config.autodetect = True
 job_config.write_disposition = "WRITE_APPEND"
 job_config.source_format = bigquery.SourceFormat.NEWLINE_DELIMITED_JSON
 load_job = client.load_table_from_dataframe(df, dataset_ref.table("weather_de"), job_config=job_config)

What's the best way to do this?最好的方法是什么?

The BigQuery client library reference states that this is intended behavior when loading into a table from a dataframe using load_table_from_dataframe() : BigQuery 客户端库参考指出,这是使用load_table_from_dataframe()从 dataframe 加载到表时的预期行为

By default, this method uses the parquet source format.默认情况下,此方法使用 parquet 源格式。 To override this, supply a value for source_format with the format name.要覆盖它,请为 source_format 提供一个具有格式名称的值。 Currently only CSV and PARQUET are supported.目前仅支持 CSV 和 PARQUET。

Something you can try is replacing that method with load_table_from_json() , which is also available , and uses NEWLINE_DELIMITED_JSON as the source format.您可以尝试用load_table_from_json()替换该方法,该方法也可用,并使用 NEWLINE_DELIMITED_JSON 作为源格式。 This method clearly will not accept a dataframe as input, so I could recommend using a JSON object to store the data you need from the API response. This method clearly will not accept a dataframe as input, so I could recommend using a JSON object to store the data you need from the API response. Otherwise, you can convert the existing dataframe you created to json using the to_json() method from the pandas doc .否则,您可以使用 pandas 文档中的to_json()方法将您创建的现有 dataframe 转换为json

You can read more into how the BigQuery client works from the reference , and you can also see the built source formats .您可以从参考资料中详细了解 BigQuery 客户端的工作原理,还可以查看构建的源格式

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM