![](/img/trans.png)
[英]How do I save a table from big query into colab as a pandas dataframe?
[英]Issues regarding insert_rows_from_dataframe using Big Query
我想將 dataframe 插入到 GCP 中的表中,假設表的名稱是 table_id。 我想使用以下
insert_rows_from_dataframe(table: Union[google.cloud.bigquery.table.Table, google.cloud.bigquery.table.TableReference, str], dataframe, selected_fields: Optional[Sequence[google.cloud.bigquery.schema.SchemaField]] = None, chunk_size: int = 500, **kwargs: Dict) → Sequence[Sequence[dict]][source]
我收到錯誤,因為我可能沒有以正確的方式編寫它。 它說的是與“架構”名稱相關的錯誤。 為我正在使用的 table_id 提供了架構名稱。 請幫助我提供一個使用此示例的示例, insert_rows_from_dataframe ,特別是
selected_fields: Optional[Sequence[google.cloud.bigquery.schema.SchemaField]] = None, chunk_size: int = 500, **kwargs: Dict
您陳述的文檔解釋了哪些類型和哪些參數是可選的等等。 本文檔是使用 Sphinx 自動生成的 我建議您嘗試並了解本文檔的工作原理,它將很有用。
如果您單擊該方法的源代碼,您可以看到該方法實際需要什么:
def insert_rows_from_dataframe(
self,
table: Union[Table, TableReference, str],
dataframe,
selected_fields: Sequence[SchemaField] = None,
chunk_size: int = 500,
**kwargs: Dict,
) -> Sequence[Sequence[dict]]:
"""Insert rows into a table from a dataframe via the streaming API.
Args:
table (Union[ \
google.cloud.bigquery.table.Table, \
google.cloud.bigquery.table.TableReference, \
str, \
]):
The destination table for the row data, or a reference to it.
dataframe (pandas.DataFrame):
A :class:`~pandas.DataFrame` containing the data to load. Any
``NaN`` values present in the dataframe are omitted from the
streaming API request(s).
selected_fields (Sequence[google.cloud.bigquery.schema.SchemaField]):
The fields to return. Required if ``table`` is a
:class:`~google.cloud.bigquery.table.TableReference`.
chunk_size (int):
The number of rows to stream in a single chunk. Must be positive.
kwargs (Dict):
Keyword arguments to
:meth:`~google.cloud.bigquery.client.Client.insert_rows_json`.
Returns:
Sequence[Sequence[Mappings]]:
A list with insert errors for each insert chunk. Each element
is a list containing one mapping per row with insert errors:
the "index" key identifies the row, and the "errors" key
contains a list of the mappings describing one or more problems
with the row.
Raises:
ValueError: if table's schema is not set
"""
insert_results = []
chunk_count = int(math.ceil(len(dataframe) / chunk_size))
rows_iter = _pandas_helpers.dataframe_to_json_generator(dataframe)
for _ in range(chunk_count):
rows_chunk = itertools.islice(rows_iter, chunk_size)
result = self.insert_rows(table, rows_chunk, selected_fields, **kwargs)
insert_results.append(result)
return insert_results´´´
因此,使用此方法的示例是:
from google.cloud import bigquery
bq_client = bigquery.Client()
table=table = bq_client.get_table("{}.{}.{}".format(PROJECT, DATASET,
TABLE))
dataframe= yourDataFrame
bq_client.insert_rows_from_dataframe(table,dataframe)
為了將架構傳遞給insert_rows_from_dataframe()
function,您需要將其傳遞到selected_fields
參數中。 正如您提到的,它的類型是Sequence[google.cloud.bigquery.schema.SchemaField]
。
所以首先你必須導入這個 class:
from google.cloud.bigquery.schema import SchemaField
然后,對於實際調用
client.insert_rows_from_dataframe(table=table_id, dataframe=df, selected_fields=schema)
在哪里
table_id
是 <dataset_name>.<table_name>df
是 dataframe,其列(和數據類型)與表中的相匹配schema
是 SchemaField 對象的列表架構示例:
[SchemaField(name="birth_date", field_type="DATE", mode="REQUIRED"),
SchemaField(name="user_name", field_type="STRING", mode="REQUIRED"),
SchemaField(name="score", field_type="INTEGER", mode="NULLABLE"),
...]
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.