I want to to insert a dataframe into a table in GCP, say the name of the table is table_id. I want to use the following
insert_rows_from_dataframe(table: Union[google.cloud.bigquery.table.Table, google.cloud.bigquery.table.TableReference, str], dataframe, selected_fields: Optional[Sequence[google.cloud.bigquery.schema.SchemaField]] = None, chunk_size: int = 500, **kwargs: Dict) → Sequence[Sequence[dict]][source]
I get it from the documentation https://googleapis.dev/python/bigquery/latest/generated/google.cloud.bigquery.client.Client.html#google.cloud.bigquery.client.Client.insert_rows_from_dataframe
I am getting errors as I'm probably not writing it in a proper way. It is saying errors related to "schema" name. Schema name is given for table_id, which I am using. Kindly help to provide me a sample example using this, insert_rows_from_dataframe , particularly
selected_fields: Optional[Sequence[google.cloud.bigquery.schema.SchemaField]] = None, chunk_size: int = 500, **kwargs: Dict
The documentation that you stated explains what types and what parameters are optional and so on. This documentation is auto generated with Sphinx I recommend you to try and understand how this documentation works, it will be useful.
If you click on the source code of the method you can see what the method is actually expecting:
def insert_rows_from_dataframe(
self,
table: Union[Table, TableReference, str],
dataframe,
selected_fields: Sequence[SchemaField] = None,
chunk_size: int = 500,
**kwargs: Dict,
) -> Sequence[Sequence[dict]]:
"""Insert rows into a table from a dataframe via the streaming API.
Args:
table (Union[ \
google.cloud.bigquery.table.Table, \
google.cloud.bigquery.table.TableReference, \
str, \
]):
The destination table for the row data, or a reference to it.
dataframe (pandas.DataFrame):
A :class:`~pandas.DataFrame` containing the data to load. Any
``NaN`` values present in the dataframe are omitted from the
streaming API request(s).
selected_fields (Sequence[google.cloud.bigquery.schema.SchemaField]):
The fields to return. Required if ``table`` is a
:class:`~google.cloud.bigquery.table.TableReference`.
chunk_size (int):
The number of rows to stream in a single chunk. Must be positive.
kwargs (Dict):
Keyword arguments to
:meth:`~google.cloud.bigquery.client.Client.insert_rows_json`.
Returns:
Sequence[Sequence[Mappings]]:
A list with insert errors for each insert chunk. Each element
is a list containing one mapping per row with insert errors:
the "index" key identifies the row, and the "errors" key
contains a list of the mappings describing one or more problems
with the row.
Raises:
ValueError: if table's schema is not set
"""
insert_results = []
chunk_count = int(math.ceil(len(dataframe) / chunk_size))
rows_iter = _pandas_helpers.dataframe_to_json_generator(dataframe)
for _ in range(chunk_count):
rows_chunk = itertools.islice(rows_iter, chunk_size)
result = self.insert_rows(table, rows_chunk, selected_fields, **kwargs)
insert_results.append(result)
return insert_results´´´
So for using this method an example would be:
from google.cloud import bigquery
bq_client = bigquery.Client()
table=table = bq_client.get_table("{}.{}.{}".format(PROJECT, DATASET,
TABLE))
dataframe= yourDataFrame
bq_client.insert_rows_from_dataframe(table,dataframe)
In order to pass the schema to the insert_rows_from_dataframe()
function, you need to pass it in the selected_fields
paramteter. As you mentioned, it is of type Sequence[google.cloud.bigquery.schema.SchemaField]
.
So first you have to import this class:
from google.cloud.bigquery.schema import SchemaField
Then, for the actual call
client.insert_rows_from_dataframe(table=table_id, dataframe=df, selected_fields=schema)
Where
table_id
is <dataset_name>.<table_name> df
is the dataframe with columns (and datatypes) matching those of the table schema
is a list of SchemaField objects An example for a schema:
[SchemaField(name="birth_date", field_type="DATE", mode="REQUIRED"),
SchemaField(name="user_name", field_type="STRING", mode="REQUIRED"),
SchemaField(name="score", field_type="INTEGER", mode="NULLABLE"),
...]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.