简体   繁体   English

使用通配符和自动检测将数据从GCS加载到BigQuery

[英]Loading data from GCS to BigQuery using Wildcards and Autodetect

New to posting on StackOverflow. 在StackOverflow上发布的新内容。

Using the google.cloud.bigquery python SDK, I have been trying to work up a solution to load data from GCS to BigQuery without defining a table schema. 我一直在尝试使用google.cloud.bigquery python SDK开发一种解决方案,以在不定义表架构的情况下将数据从GCS加载到BigQuery。

My LoadJobConfig 's autodetect is set to True and I am using a wildcard (*) in the GCS URI. 我的LoadJobConfig的自动检测设置为True,并且我在GCS URI中使用通配符(*)。

I have confirmed that Autodetect works with wildcards but the load job fails because the data source that I am working with will usually autodetect a specific column to be a float (eg 0.30) but sometimes adds operator symbols (eg < 0.10) and thus needs to be a string. 我已经确认Autodetect可以使用通配符,但是加载作业会失败,因为我正在使用的数据源通常会自动检测特定列为浮点(例如0.30),但有时会添加运算符(例如<0.10),因此需要是一个字符串。

Can anyone think of a solution without having to define the schema? 任何人都可以在无需定义架构的情况下想到解决方案吗? Here's my LoadJobConfig that I've passed to bigquery.client.Client 's load_table_from_uri method. 这里是我的LoadJobConfig ,我已经传给bigquery.client.Clientload_table_from_uri方法。

source_uri = 'gs://%s/%s/%s/*' % (source, report_type, date)
job_config = bigquery.LoadJobConfig()
job_config.create_disposition = 'CREATE_IF_NEEDED'
job_config.skip_leading_rows = 1
job_config.source_format = 'CSV'
job_config.write_disposition = 'WRITE_TRUNCATE'
job_config.autodetect = True
job = bigquery_client.load_table_from_uri(source_uri, table_ref, job_config=job_config)
job.result()

Your data seems to be broken in some part. 您的数据似乎在某种程度上被破坏了。

I would suggest to use flag: --max_bad_records which skips broken records. 我建议使用标志:-- --max_bad_records跳过损坏的记录。

For details please look here: https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-csv#bigquery-import-gcs-file-python 有关详细信息,请参见此处: https : //cloud.google.com/bigquery/docs/loading-data-cloud-storage-csv#bigquery-import-gcs-file-python

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM