简体   繁体   中英

How to get column name from BigQuery API?

I can get column values using the following codes:

os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'C:\\Users\xxx\Desktop\key.json'
bq_client = Client()
query = "SELECT msts, coreuserid, spend_usd FROM `project.f_purchase` where dt = '2019-04-02' limit 5"
query_job = bq_client.query(query)
results = query_job.result()   

for row in results:
    print("{}, {}, {}".format(row.msts, row.uid, row.spend_amount))

But as shown in the last row, this requires direct column name. Now I have multiple queries and I want to run them in a look and display the result. Is there a way like .format(row.column1, row.column2...) ? In addition, number of result columns are different for the queries.

Any help is appreciated.

Per BigQuery Python client documentation you can loop over the row object as follow without specifying the exact column name:

for row in query_job:  # API request - fetches results
    # Row values can be accessed by field name or index
    assert row[0] == row.name == row["name"]
    print(row)

In addition, you can always use the SchemaField values as described in this answer

result = ["{0} {1}".format(schema.name,schema.field_type) for schema in table.schema]

This is an example using a BigQuery public dataset on how to access fields without specifying the field name:

from google.cloud import bigquery
from pprint import pprint
import json

client = bigquery.Client()

query = (
    "SELECT state,max(gender) as gender FROM `bigquery-public-data.usa_names.usa_1910_2013` "
    'GROUP BY state '
    "LIMIT 10"
)
query_job = client.query(
    query,
    # Location must match that of the dataset(s) referenced in the query.
    location="US",
)  # API request - starts the query

for num, row in enumerate(query_job, start=1):  # API request - fetches results
    # Row values can be accessed by field name or index
    # assert row[0] == row.name == row["name"]
    print("{} AS {}, {} AS {}".format(row[0], query_job._query_results._properties['schema']['fields'][0]['name'], row[1], query_job._query_results._properties['schema']['fields'][1]['name']))

    #print(row[0], row[1])

print(json.dumps(query_job._query_results._properties['schema']['fields'][0]['name']))
print(query_job._query_results._properties)
#pprint(vars(query_job._query_results._properties))

Which produces the following output:

superQuery:bin tamirklein$ python test.py
AK AS state, M AS gender
AL AS state, M AS gender
AR AS state, M AS gender
AZ AS state, M AS gender
CA AS state, M AS gender
CO AS state, M AS gender
CT AS state, M AS gender
DC AS state, M AS gender
DE AS state, M AS gender
FL AS state, M AS gender

You can also cast the row in your for loop into dict (by dict(row) ). Then keys are the column names and you can do whatever you can do with the dictionary -- iterate over the keys (column names), values (column values) or both together, without the need to know the column names explicitly up-front.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM