使用通配符和自动检测将数据从GCS加载到BigQuery

Question

New to posting on StackOverflow. 在StackOverflow上发布的新内容。

Using the google.cloud.bigquery python SDK, I have been trying to work up a solution to load data from GCS to BigQuery without defining a table schema. 我一直在尝试使用google.cloud.bigquery python SDK开发一种解决方案，以在不定义表架构的情况下将数据从GCS加载到BigQuery。

My LoadJobConfig 's autodetect is set to True and I am using a wildcard (*) in the GCS URI. 我的LoadJobConfig的自动检测设置为True，并且我在GCS URI中使用通配符（*）。

I have confirmed that Autodetect works with wildcards but the load job fails because the data source that I am working with will usually autodetect a specific column to be a float (eg 0.30) but sometimes adds operator symbols (eg < 0.10) and thus needs to be a string. 我已经确认Autodetect可以使用通配符，但是加载作业会失败，因为我正在使用的数据源通常会自动检测特定列为浮点（例如0.30），但有时会添加运算符（例如<0.10），因此需要是一个字符串。

Can anyone think of a solution without having to define the schema? 任何人都可以在无需定义架构的情况下想到解决方案吗？ Here's my LoadJobConfig that I've passed to bigquery.client.Client 's load_table_from_uri method. 这里是我的LoadJobConfig ，我已经传给bigquery.client.Client的load_table_from_uri方法。

source_uri = 'gs://%s/%s/%s/*' % (source, report_type, date)
job_config = bigquery.LoadJobConfig()
job_config.create_disposition = 'CREATE_IF_NEEDED'
job_config.skip_leading_rows = 1
job_config.source_format = 'CSV'
job_config.write_disposition = 'WRITE_TRUNCATE'
job_config.autodetect = True
job = bigquery_client.load_table_from_uri(source_uri, table_ref, job_config=job_config)
job.result()

Answer 1

Your data seems to be broken in some part. 您的数据似乎在某种程度上被破坏了。

I would suggest to use flag: --max_bad_records which skips broken records. 我建议使用标志：-- --max_bad_records跳过损坏的记录。

For details please look here: https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-csv#bigquery-import-gcs-file-python 有关详细信息，请参见此处： https : //cloud.google.com/bigquery/docs/loading-data-cloud-storage-csv#bigquery-import-gcs-file-python

使用通配符和自动检测将数据从GCS加载到BigQuery

问题描述

1 个解决方案

解决方案1
0 已采纳 2017-11-06 16:31:48

使用通配符和自动检测将数据从GCS加载到BigQuery

问题描述

1 个解决方案

解决方案1 0 已采纳 2017-11-06 16:31:48

解决方案1
0 已采纳 2017-11-06 16:31:48