简体   繁体   English

使用google.cloud获取BigQuery表架构

[英]Get BigQuery table schema using google.cloud

I can for example get BigQuery data into local python with: 我可以通过以下方式将BigQuery数据导入本地python:

import os
from google.cloud import bigquery

project_id = "example-project"
dataset_id = "exapmle_dataset"
table_id = "table_id"

os.environ["GOOGLE_CLOUD_PROJECT"] = project_id
bq = bigquery.Client()

query = "SELECT * FROM {}.{} LIMIT 5".format(dataset_id, table_id)
resp = bq.run_sync_query(query)
resp.run()
data_list = resp.rows

The result: 结果:

print(data_list)
>>> [('BEDD', '1',), ('A75', '1',), ('CE3F', '1',), ('0D8C', '1',), ('3E9C', '1',)]

How do I then go and get the schema for this table? 然后我如何去获取此表的架构? Such that, for example 例如,这样

headings = ('heading1', 'heading2')
# or
schema_dict = {'fields': [{'name': 'heading1', 'type': 'STRING'}, {'name': 'heading2', 'type': 'STRING'}]}

You can use the schema method from your resp variable. 您可以使用resp变量中的schema方法。

After running the query you can retrieve it: 运行查询后,您可以检索它:

schema = resp.schema

schema will be a list containing the definition for each column in your query. schema将是一个列表,其中包含查询中每列的定义。

As an example, lets say this is your query: 举个例子,假设这是你的查询:

query = "select '1' as fv, STRUCT<i INT64, j INT64> (1, 2) t  from `dataset.table` limit 1"

The schema will be a list containing 2 entries: 架构将是包含2个条目的列表:

[<google.cloud.bigquery.schema.SchemaField at 0x7ffa64fe6e50>,
 <google.cloud.bigquery.schema.SchemaField at 0x7ffa64fe6b10>]

For each object in schema, you have the methods field_type , fields , mode and name so if you run: 对于模式中的每个对象,您有方法field_typefieldsmodename因此如果您运行:

schema[0].field_type, schema[0].mode, schema[0].name

The result is "STRING", "NULLABLE", "fv". 结果是“STRING”,“NULLABLE”,“fv”。

As the second column is a record, then if you run: 由于第二列是记录,因此如果您运行:

schema[1].field_type, schema[1].mode, schema[1].name, schema[1].fields

The result is: 结果是:

"RECORD", "NULLABLE", "t", [google schema 1, google schema 2] “RECORD”,“NULLABLE”,“t”,[google schema 1,google schema 2]

Where google schema 1 contains the definition for the inner fields within the record. google schema 1包含记录中内部字段的定义。

As far as I know, there's no way of getting a dictionary as you showed in your question, which means you'll have to loop over the entries in schema and build it yourself. 据我所知,你无法获得你在问题中显示的字典,这意味着你必须循环遍历schema的条目并自己构建它。 It should be simple though. 它应该很简单。 Not sure if this is working as I haven't fully tested it but it might give you an idea on how to do it: 不确定这是否有效,因为我还没有完全测试它,但它可能会让你知道如何做到这一点:

def extract_schema(schema_resp):
    l = []
    for schema_obj in schema_resp:
        r = {}
        r['name'] = schema_obj.name
        r['type'] = schema_obj.field_type
        r['mode'] = schema_obj.mode
        if schema_obj.fields:
            r['fields'] = extract_schema(schema_obj.fields)
        l.append(r)
    return l

So you'd just have to run schema = extract_schema(resp.schema) and (hopefully) you'll be good to go. 所以你只需要运行schema = extract_schema(resp.schema)并且(希望)你会好起来的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 google.cloud在哪个python库中? - In what python library is google.cloud? 从 google.cloud 导入存储失败:ImportError: No module named google.cloud - from google.cloud import storage fails: ImportError: No module named google.cloud 来自google.cloud导入firestore ModuleNotFoundError:没有名为&#39;google&#39;的模块 - from google.cloud import firestore ModuleNotFoundError: No module named 'google' Python BigQuery API-获取表架构 - Python BigQuery API - get table schema ImportError 无法从“google.cloud”(未知位置)导入“langauge” - ImportError cannot import 'langauge' from 'google.cloud' (unknown location) __init__.py 中的 google.cloud 命名空间导入错误 - google.cloud namespace import error in __init__.py 无法在 python 脚本中使用 google.cloud(存储) - Unable to use google.cloud (storage) from within a python script google.cloud storage python api在指定位置创建存储桶 - google.cloud storage python api create bucket in specified location ModuleNotFoundError:没有名为“google.cloud”的模块,即使我安装了它 - ModuleNotFoundError: No module named 'google.cloud' even though I installed it 无法从“google.cloud”(未知位置)导入名称“dataproc_v1” - cannot import name 'dataproc_v1' from 'google.cloud' (unknown location)
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM