[英]Get BigQuery table schema using google.cloud
I can for example get BigQuery data into local python with: 我可以通过以下方式将BigQuery数据导入本地python:
import os
from google.cloud import bigquery
project_id = "example-project"
dataset_id = "exapmle_dataset"
table_id = "table_id"
os.environ["GOOGLE_CLOUD_PROJECT"] = project_id
bq = bigquery.Client()
query = "SELECT * FROM {}.{} LIMIT 5".format(dataset_id, table_id)
resp = bq.run_sync_query(query)
resp.run()
data_list = resp.rows
The result: 结果:
print(data_list)
>>> [('BEDD', '1',), ('A75', '1',), ('CE3F', '1',), ('0D8C', '1',), ('3E9C', '1',)]
How do I then go and get the schema for this table? 然后我如何去获取此表的架构? Such that, for example
例如,这样
headings = ('heading1', 'heading2')
# or
schema_dict = {'fields': [{'name': 'heading1', 'type': 'STRING'}, {'name': 'heading2', 'type': 'STRING'}]}
You can use the schema
method from your resp
variable. 您可以使用
resp
变量中的schema
方法。
After running the query you can retrieve it: 运行查询后,您可以检索它:
schema = resp.schema
schema will be a list containing the definition for each column in your query. schema将是一个列表,其中包含查询中每列的定义。
As an example, lets say this is your query: 举个例子,假设这是你的查询:
query = "select '1' as fv, STRUCT<i INT64, j INT64> (1, 2) t from `dataset.table` limit 1"
The schema will be a list containing 2 entries: 架构将是包含2个条目的列表:
[<google.cloud.bigquery.schema.SchemaField at 0x7ffa64fe6e50>,
<google.cloud.bigquery.schema.SchemaField at 0x7ffa64fe6b10>]
For each object in schema, you have the methods field_type
, fields
, mode
and name
so if you run: 对于模式中的每个对象,您有方法
field_type
, fields
, mode
和name
因此如果您运行:
schema[0].field_type, schema[0].mode, schema[0].name
The result is "STRING", "NULLABLE", "fv". 结果是“STRING”,“NULLABLE”,“fv”。
As the second column is a record, then if you run: 由于第二列是记录,因此如果您运行:
schema[1].field_type, schema[1].mode, schema[1].name, schema[1].fields
The result is: 结果是:
"RECORD", "NULLABLE", "t", [google schema 1, google schema 2] “RECORD”,“NULLABLE”,“t”,[google schema 1,google schema 2]
Where google schema 1
contains the definition for the inner fields within the record. google schema 1
包含记录中内部字段的定义。
As far as I know, there's no way of getting a dictionary as you showed in your question, which means you'll have to loop over the entries in schema
and build it yourself. 据我所知,你无法获得你在问题中显示的字典,这意味着你必须循环遍历
schema
的条目并自己构建它。 It should be simple though. 它应该很简单。 Not sure if this is working as I haven't fully tested it but it might give you an idea on how to do it:
不确定这是否有效,因为我还没有完全测试它,但它可能会让你知道如何做到这一点:
def extract_schema(schema_resp):
l = []
for schema_obj in schema_resp:
r = {}
r['name'] = schema_obj.name
r['type'] = schema_obj.field_type
r['mode'] = schema_obj.mode
if schema_obj.fields:
r['fields'] = extract_schema(schema_obj.fields)
l.append(r)
return l
So you'd just have to run schema = extract_schema(resp.schema)
and (hopefully) you'll be good to go. 所以你只需要运行
schema = extract_schema(resp.schema)
并且(希望)你会好起来的。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.