有關使用 Big Query insert_rows_from_dataframe 的問題

Question

我想將 dataframe 插入到 GCP 中的表中，假設表的名稱是 table_id。 我想使用以下

insert_rows_from_dataframe(table: Union[google.cloud.bigquery.table.Table, google.cloud.bigquery.table.TableReference, str], dataframe, selected_fields: Optional[Sequence[google.cloud.bigquery.schema.SchemaField]] = None, chunk_size: int = 500, **kwargs: Dict) → Sequence[Sequence[dict]][source]

我從文檔https://googleapis.dev/python/bigquery/latest/generated/google.cloud.bigquery.client.Client.html#google.cloud.bigquery.client.Client.insert_rows_from_dataframe

我收到錯誤，因為我可能沒有以正確的方式編寫它。 它說的是與“架構”名稱相關的錯誤。 為我正在使用的 table_id 提供了架構名稱。 請幫助我提供一個使用此示例的示例， insert_rows_from_dataframe ，特別是

selected_fields: Optional[Sequence[google.cloud.bigquery.schema.SchemaField]] = None, chunk_size: int = 500, **kwargs: Dict

Answer 1

您陳述的文檔解釋了哪些類型和哪些參數是可選的等等。 本文檔是使用 Sphinx 自動生成的我建議您嘗試並了解本文檔的工作原理，它將很有用。

如果您單擊該方法的源代碼，您可以看到該方法實際需要什么：

def insert_rows_from_dataframe(
    self,
    table: Union[Table, TableReference, str],
    dataframe,
    selected_fields: Sequence[SchemaField] = None,
    chunk_size: int = 500,
    **kwargs: Dict,
) -> Sequence[Sequence[dict]]:
    """Insert rows into a table from a dataframe via the streaming API.

    Args:
        table (Union[ \
            google.cloud.bigquery.table.Table, \
            google.cloud.bigquery.table.TableReference, \
            str, \
        ]):
            The destination table for the row data, or a reference to it.
        dataframe (pandas.DataFrame):
            A :class:`~pandas.DataFrame` containing the data to load. Any
            ``NaN`` values present in the dataframe are omitted from the
            streaming API request(s).
        selected_fields (Sequence[google.cloud.bigquery.schema.SchemaField]):
            The fields to return. Required if ``table`` is a
            :class:`~google.cloud.bigquery.table.TableReference`.
        chunk_size (int):
            The number of rows to stream in a single chunk. Must be positive.
        kwargs (Dict):
            Keyword arguments to
            :meth:`~google.cloud.bigquery.client.Client.insert_rows_json`.

    Returns:
        Sequence[Sequence[Mappings]]:
            A list with insert errors for each insert chunk. Each element
            is a list containing one mapping per row with insert errors:
            the "index" key identifies the row, and the "errors" key
            contains a list of the mappings describing one or more problems
            with the row.

    Raises:
        ValueError: if table's schema is not set
    """
    insert_results = []

    chunk_count = int(math.ceil(len(dataframe) / chunk_size))
    rows_iter = _pandas_helpers.dataframe_to_json_generator(dataframe)

    for _ in range(chunk_count):
        rows_chunk = itertools.islice(rows_iter, chunk_size)
        result = self.insert_rows(table, rows_chunk, selected_fields, **kwargs)
        insert_results.append(result)

    return insert_results´´´

因此，使用此方法的示例是：

from google.cloud import bigquery
bq_client = bigquery.Client()
table=table = bq_client.get_table("{}.{}.{}".format(PROJECT, DATASET, 
TABLE))
dataframe= yourDataFrame

bq_client.insert_rows_from_dataframe(table,dataframe)

Answer 2

為了將架構傳遞給insert_rows_from_dataframe() function，您需要將其傳遞到selected_fields參數中。 正如您提到的，它的類型是Sequence[google.cloud.bigquery.schema.SchemaField] 。

所以首先你必須導入這個 class：

from google.cloud.bigquery.schema import SchemaField

然后，對於實際調用

client.insert_rows_from_dataframe(table=table_id, dataframe=df, selected_fields=schema)

在哪里

table_id是 <dataset_name>.<table_name>
df是 dataframe，其列（和數據類型）與表中的相匹配
schema是 SchemaField 對象的列表

架構示例：

[SchemaField(name="birth_date", field_type="DATE", mode="REQUIRED"),
 SchemaField(name="user_name", field_type="STRING", mode="REQUIRED"),
 SchemaField(name="score", field_type="INTEGER", mode="NULLABLE"),
 ...]

有關使用 Big Query insert_rows_from_dataframe 的問題

問題描述

2 個解決方案

解決方案1
0 2021-10-05 11:52:42

解決方案2
0 2022-12-25 17:32:07

有關使用 Big Query insert_rows_from_dataframe 的問題

問題描述

2 個解決方案

解決方案1 0 2021-10-05 11:52:42

解決方案2 0 2022-12-25 17:32:07

解決方案1
0 2021-10-05 11:52:42

解決方案2
0 2022-12-25 17:32:07