简体   繁体   中英

Google BigQuery: In Python, column addition makes all the other columns Nullable

I have a table that already exists with the following schema:

{
  "schema": {
    "fields": [
      {
        "mode": "required",
        "name": "full_name",
        "type": "string"
      },
      {
        "mode": "required",
        "name": "age",
        "type": "integer"
      }]
  }
}

It already contains entries like:

{'full_name': 'John Doe',
          'age': int(33)}

I want to insert a new record with a new field and have the load job automatically add the new column as it loads. The new format looks like this:

record = {'full_name': 'Karen Walker',
          'age': int(48),
          'zipcode': '63021'}

My code is as follows:

from google.cloud import bigquery
client = bigquery.Client(project=projectname)
table = client.get_table(table_id)

config = bigquery.LoadJobConfig()
config.autoedetect = True
config.source_format = bigquery.SourceFormat.NEWLINE_DELIMITED_JSON
config.write_disposition = bigquery.WriteDisposition.WRITE_APPEND
config.schema_update_options = [
    bigquery.SchemaUpdateOption.ALLOW_FIELD_ADDITION,
                               ]

job = client.load_table_from_json([record], table, job_config=config)
job.result()

This results in the following error:

400 Provided Schema does not match Table my_project:my_dataset:mytable. Field age has changed mode from REQUIRED to NULLABLE

I can fix this by changing config.schema_update_options as follows:

    bigquery.SchemaUpdateOption.ALLOW_FIELD_ADDITION,
    bigquery.SchemaUpdateOption.ALLOW_FIELD_RELAXATION
                               ]

This allows me to insert the new record, with zipcode added to the schema, but it causes both full_name and age to become NULLABLE , which is not the behavior I want. Is there a way to prevent schema auto-detect from changing the existing columns?

If you need to add fields to your schema, you can do the following:

from google.cloud import bigquery
client = bigquery.Client()

table = client.get_table("your-project.your-dataset.your-table")

original_schema = table.schema   # Get your current table's schema
new_schema = original_schema[:]  # Creates a copy of the schema.
# Add new field to schema
new_schema.append(bigquery.SchemaField("new_field", "STRING")) 

# Set new schema in your table object
table.schema = new_schema   
# Call API to update your table with the new schema
table = client.update_table(table, ["schema"])  

After updating your table's schema you can load your new records with this additional field ignoring any schema configurations.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM