简体   繁体   English

Bigquery 在使用 Python 加载 json 文件时将我的字符串字段转换为 integer

[英]Bigquery converts my string field into integer while loading json file with Python

{"number":"1234123"} I am assigning this data to my Bigquery table using bigquery.LoadJobConfig in python. The type of my number column in my bigquery table is string. {"number":"1234123"} 我在 python 中使用 bigquery.LoadJobConfig 将此数据分配给我的 Bigquery 表。我的 bigquery 表中我的数字列的类型是字符串。 When I do the load operation, it converts the data type in my bigquery table to integer. How can I solve this?当我执行加载操作时,它会将我的 bigquery 表中的数据类型转换为 integer。我该如何解决这个问题? The file type I loaded: json.我加载的文件类型:json。

job_config = bigquery.LoadJobConfig(
create_disposition=bigquery.CreateDisposition.CREATE_IF_NEEDED,
write_disposition=bigquery.WriteDisposition.WRITE_APPEND,
source_format=bigquery.SourceFormat.NEWLINE_DELIMITED_JSON,autodetect=True
)

Additionally: When I set autodetect to False, I get an error like Error while reading data, error message: JSON table encountered too many errors另外:当我将autodetect设置为False时,我在读取数据时收到类似Error的错误,错误信息:JSON table encountered too many errors

I recommend you to pass a BigQuery schema to prevent this situation, instead to use autodetect=True , example:我建议您传递一个BigQuery schema来防止这种情况,而不是使用autodetect=True ,例如:

from google.cloud import bigquery

# Construct a BigQuery client object.
client = bigquery.Client()

# TODO(developer): Set table_id to the ID of the table to create.
# table_id = "your-project.your_dataset.your_table_name"

job_config = bigquery.LoadJobConfig(
    schema=[
        bigquery.SchemaField("number", "STRING")
    ],
    create_disposition=bigquery.CreateDisposition.CREATE_IF_NEEDED,
    write_disposition=bigquery.WriteDisposition.WRITE_APPEND,
    source_format=bigquery.SourceFormat.NEWLINE_DELIMITED_JSON,
    autodetect=False
)
uri = "gs://cloud-samples-data/bigquery/us-states/us-states.json"

load_job = client.load_table_from_uri(
    uri,
    table_id,
    location="US",  # Must match the destination dataset location.
    job_config=job_config,
)  # Make an API request.

load_job.result()  # Waits for the job to complete.

destination_table = client.get_table(table_id)
print("Loaded {} rows.".format(destination_table.num_rows))

In this example I set the schema of BigQuery table and autodetect to False .在此示例中,我将BigQuery表的架构和autodetect设置为False If you use autodetect to True , you can't have a control on your field types.如果您对True使用autodetect ,则无法控制您的字段类型。

You can check the documentation to have more information.您可以查看文档以获取更多信息。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM