简体   繁体   中英

Get BigQuery table schema using google.cloud

I can for example get BigQuery data into local python with:

import os
from google.cloud import bigquery

project_id = "example-project"
dataset_id = "exapmle_dataset"
table_id = "table_id"

os.environ["GOOGLE_CLOUD_PROJECT"] = project_id
bq = bigquery.Client()

query = "SELECT * FROM {}.{} LIMIT 5".format(dataset_id, table_id)
resp = bq.run_sync_query(query)
resp.run()
data_list = resp.rows

The result:

print(data_list)
>>> [('BEDD', '1',), ('A75', '1',), ('CE3F', '1',), ('0D8C', '1',), ('3E9C', '1',)]

How do I then go and get the schema for this table? Such that, for example

headings = ('heading1', 'heading2')
# or
schema_dict = {'fields': [{'name': 'heading1', 'type': 'STRING'}, {'name': 'heading2', 'type': 'STRING'}]}

You can use the schema method from your resp variable.

After running the query you can retrieve it:

schema = resp.schema

schema will be a list containing the definition for each column in your query.

As an example, lets say this is your query:

query = "select '1' as fv, STRUCT<i INT64, j INT64> (1, 2) t  from `dataset.table` limit 1"

The schema will be a list containing 2 entries:

[<google.cloud.bigquery.schema.SchemaField at 0x7ffa64fe6e50>,
 <google.cloud.bigquery.schema.SchemaField at 0x7ffa64fe6b10>]

For each object in schema, you have the methods field_type , fields , mode and name so if you run:

schema[0].field_type, schema[0].mode, schema[0].name

The result is "STRING", "NULLABLE", "fv".

As the second column is a record, then if you run:

schema[1].field_type, schema[1].mode, schema[1].name, schema[1].fields

The result is:

"RECORD", "NULLABLE", "t", [google schema 1, google schema 2]

Where google schema 1 contains the definition for the inner fields within the record.

As far as I know, there's no way of getting a dictionary as you showed in your question, which means you'll have to loop over the entries in schema and build it yourself. It should be simple though. Not sure if this is working as I haven't fully tested it but it might give you an idea on how to do it:

def extract_schema(schema_resp):
    l = []
    for schema_obj in schema_resp:
        r = {}
        r['name'] = schema_obj.name
        r['type'] = schema_obj.field_type
        r['mode'] = schema_obj.mode
        if schema_obj.fields:
            r['fields'] = extract_schema(schema_obj.fields)
        l.append(r)
    return l

So you'd just have to run schema = extract_schema(resp.schema) and (hopefully) you'll be good to go.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM