[英]Bigquery add columns to table schema
I am trying to add new column to BigQuery existing table.我正在尝试将新列添加到 BigQuery 现有表。 I have tried bq command tool and API approach.我尝试过 bq 命令工具和 API 方法。 I get following error when making call to Tables.update().调用 Tables.update() 时出现以下错误。
I have tried with providing full schema with additional field and that also gives me same error as shown below.我尝试提供带有附加字段的完整架构,这也给了我同样的错误,如下所示。
With API I get following Error:使用 API 我得到以下错误:
{
"schema": {
"fields": [{
"name": "added_column",
"type": "integer",
"mode": "nullable"
}]
}
}
{
"error": {
"errors": [{
"domain": "global",
"reason": "invalid",
"message": "Provided Schema does not match Table [blah]"
}],
"code": 400,
"message": "Provided Schema does not match Table [blah]"
}
}
With BQ tool I get following error:使用 BQ 工具出现以下错误:
./bq update -t blah added_column:integer
BigQuery error in update operation: Provided Schema does not match Table [blah]更新操作中的 BigQuery 错误:提供的架构与表不匹配 [blah]
Try this:试试这个:
bq --format=prettyjson show yourdataset.yourtable > table.json
Edit table.json and remove everything except the inside of "fields" (eg keep the [ { "name": "x" ... }, ... ]
).编辑 table.json 并删除除“字段”内部之外的所有内容(例如保留[ { "name": "x" ... }, ... ]
)。 Then add your new field to the schema.然后将您的新字段添加到架构中。
bq --format=prettyjson show yourdataset.yourtable | jq .schema.fields > table.json
Then run:然后运行:
bq update yourdataset.yourtable table.json
You can add --apilog=apilog.txt
to the beginning of the command line which will show exactly what is sent / returned from the bigquery server.您可以将--apilog=apilog.txt
添加到命令行的开头,这将准确显示从 bigquery 服务器发送/返回的内容。
In my case I was trying to add a REQUIRED
field to a template table, and was running into this error.在我的例子中,我试图将一个REQUIRED
字段添加到模板表中,但遇到了这个错误。 Changing the field to NULLABLE
, let me update the table.将字段更改为NULLABLE
,让我更新表。
Also more recent version on updates for anybody stumbling from Google.还有更新的更新版本,供任何从 Google 绊倒的人使用。
#To create table
bq mk --schema domain:string,pageType:string,source:string -t Project:Dataset.table
#Or using schema file
bq mk --schema SchemaFile.json -t Project:Dataset.table
#SchemaFile.json format
[{
"mode": "REQUIRED",
"name": "utcTime",
"type": "TIMESTAMP"
},
{
"mode": "REQUIRED",
"name": "domain",
"type": "STRING"
},
{
"mode": "NULLABLE",
"name": "testBucket",
"type": "STRING"
},
{
"mode": "REQUIRED",
"name": "isMobile",
"type": "BOOLEAN"
},
{
"mode": "REQUIRED",
"name": "Category",
"type": "RECORD",
"fields": [
{
"mode": "NULLABLE",
"name": "Type",
"type": "STRING"
},
{
"mode": "REQUIRED",
"name": "Published",
"type": "BOOLEAN"
}
]
}]
# TO update
bq update --schema UpdatedSchema.json -t Project:Dataset.table
# Updated Schema contains old and any newly added columns
Example using the BigQuery Node JS API:使用 BigQuery Node JS API 的示例:
const fieldDefinition = {
name: 'nestedColumn',
type: 'RECORD',
mode: 'REPEATED',
fields: [
{name: 'id', type: 'INTEGER', mode: 'NULLABLE'},
{name: 'amount', type: 'INTEGER', mode: 'NULLABLE'},
],
};
const table = bigQuery.dataset('dataset1').table('source_table_name');
const metaDataResult = await table.getMetadata();
const metaData = metaDataResult[0];
const fields = metaData.schema.fields;
fields.push(fieldDefinition);
await table.setMetadata({schema: {fields}});
I was stuck trying to add columns to an existing table in BigQuery using the Python client and found this post several times.我一直在尝试使用 Python 客户端向 BigQuery 中的现有表添加列,并多次发现这篇文章。 I'll then let the piece of code that solved it for me, in case someone's having the same problem:然后我会让这段代码为我解决它,以防有人遇到同样的问题:
# update table schema
bigquery_client = bigquery.Client()
dataset_ref = bigquery_client.dataset(dataset_id)
table_ref = dataset_ref.table(table_id)
table = bigquery_client.get_table(table_ref)
new_schema = list(table.schema)
new_schema.append(bigquery.SchemaField('LOLWTFMAN','STRING'))
table.schema = new_schema
table = bigquery_client.update_table(table, ['schema']) # API request
Here's a quick snippet I wrote that will dynamically add schema columns if the data coming in (from a server, etc) doesn't match what exists currently in a BigQuery Table:这是我编写的一个快速片段,如果传入的数据(来自服务器等)与 BigQuery 表中当前存在的数据不匹配,它将动态添加架构列:
def verify_schema(client, table, data_dict):
schema = list(table.schema)
existing_schema_names = [schema.name for schema in schema]
validation_list = [True if schema_field in existing_schema_names else schema.append(
bigquery.SchemaField(name=schema_field, field_type='STRING', mode='NULLABLE')) for schema_field in data_dict.keys()]
if None in validation_list:
table.schema = schema
client.update_table(table, ['schema'])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.