使用google.cloud获取BigQuery表架构

Question

I can for example get BigQuery data into local python with: 我可以通过以下方式将BigQuery数据导入本地python：

import os
from google.cloud import bigquery

project_id = "example-project"
dataset_id = "exapmle_dataset"
table_id = "table_id"

os.environ["GOOGLE_CLOUD_PROJECT"] = project_id
bq = bigquery.Client()

query = "SELECT * FROM {}.{} LIMIT 5".format(dataset_id, table_id)
resp = bq.run_sync_query(query)
resp.run()
data_list = resp.rows

The result: 结果：

print(data_list)
>>> [('BEDD', '1',), ('A75', '1',), ('CE3F', '1',), ('0D8C', '1',), ('3E9C', '1',)]

How do I then go and get the schema for this table? 然后我如何去获取此表的架构？ Such that, for example 例如，这样

headings = ('heading1', 'heading2')
# or
schema_dict = {'fields': [{'name': 'heading1', 'type': 'STRING'}, {'name': 'heading2', 'type': 'STRING'}]}

Answer 1

You can use the schema method from your resp variable. 您可以使用resp变量中的schema方法。

After running the query you can retrieve it: 运行查询后，您可以检索它：

schema = resp.schema

schema will be a list containing the definition for each column in your query. schema将是一个列表，其中包含查询中每列的定义。

As an example, lets say this is your query: 举个例子，假设这是你的查询：

query = "select '1' as fv, STRUCT<i INT64, j INT64> (1, 2) t  from `dataset.table` limit 1"

The schema will be a list containing 2 entries: 架构将是包含2个条目的列表：

[<google.cloud.bigquery.schema.SchemaField at 0x7ffa64fe6e50>,
 <google.cloud.bigquery.schema.SchemaField at 0x7ffa64fe6b10>]

For each object in schema, you have the methods field_type , fields , mode and name so if you run: 对于模式中的每个对象，您有方法field_type ， fields ， mode和name因此如果您运行：

schema[0].field_type, schema[0].mode, schema[0].name

The result is "STRING", "NULLABLE", "fv". 结果是“STRING”，“NULLABLE”，“fv”。

As the second column is a record, then if you run: 由于第二列是记录，因此如果您运行：

schema[1].field_type, schema[1].mode, schema[1].name, schema[1].fields

The result is: 结果是：

"RECORD", "NULLABLE", "t", [google schema 1, google schema 2] “RECORD”，“NULLABLE”，“t”，[google schema 1，google schema 2]

Where google schema 1 contains the definition for the inner fields within the record. google schema 1包含记录中内部字段的定义。

As far as I know, there's no way of getting a dictionary as you showed in your question, which means you'll have to loop over the entries in schema and build it yourself. 据我所知，你无法获得你在问题中显示的字典，这意味着你必须循环遍历schema的条目并自己构建它。 It should be simple though. 它应该很简单。 Not sure if this is working as I haven't fully tested it but it might give you an idea on how to do it: 不确定这是否有效，因为我还没有完全测试它，但它可能会让你知道如何做到这一点：

def extract_schema(schema_resp):
    l = []
    for schema_obj in schema_resp:
        r = {}
        r['name'] = schema_obj.name
        r['type'] = schema_obj.field_type
        r['mode'] = schema_obj.mode
        if schema_obj.fields:
            r['fields'] = extract_schema(schema_obj.fields)
        l.append(r)
    return l

So you'd just have to run schema = extract_schema(resp.schema) and (hopefully) you'll be good to go. 所以你只需要运行schema = extract_schema(resp.schema)并且（希望）你会好起来的。

使用google.cloud获取BigQuery表架构

问题描述

1 个解决方案

解决方案1
3 已采纳 2017-01-31 15:56:17

使用google.cloud获取BigQuery表架构

问题描述

1 个解决方案

解决方案1 3 已采纳 2017-01-31 15:56:17

解决方案1
3 已采纳 2017-01-31 15:56:17