BigQuery - 獲取 BigQuery 表中的總列數

Question

有沒有辦法查詢 BigQuery 表中的總列數？ 我瀏覽了 BigQuery 文檔，但沒有找到任何相關內容。

提前致謝！

Answer 1

使用 SQL 查詢和內置 INFORMATION_SCHEMA 表：

SELECT count(distinct column_name) 
FROM  `project_id`.name_of_dataset.INFORMATION_SCHEMA.COLUMNS
WHERE table_name = "name_of_table"

Answer 2

有幾種方法可以做到這一點：

A. 使用BQ命令行工具和JQ linux 庫解析 JSON。

bq --format=json show publicdata:samples.shakespeare | jq '.schema.fields | length'

這個輸出：

B. 使用 REST api 執行Tables:get調用

GET https://www.googleapis.com/bigquery/v2/projects/projectId/datasets/datasetId/tables/tableId

這將返回一個完整的 JSON，您可以解析和查詢 schema.field 長度。

{
   "kind":"bigquery#table",
   "description":"This dataset is a word index of the works of Shakespeare, giving the number of times each word appears in each corpus.",
   "creationTime":"1335916045099",
   "tableReference":{
      "projectId":"publicdata",
      "tableId":"shakespeare",
      "datasetId":"samples"
   },
   "numRows":"164656",
   "numBytes":"6432064",
   "etag":"\"E7ZNanj79wmDHI9DmeCWoYoUpAE/MTQxMzkyNjgyNzI1Nw\"",
   "lastModifiedTime":"1413926827257",
   "type":"TABLE",
   "id":"publicdata:samples.shakespeare",
   "selfLink":"https://www.googleapis.com/bigquery/v2/projects/publicdata/datasets/samples/tables/shakespeare",
   "schema":{
      "fields":[
         {
            "description":"A single unique word (where whitespace is the delimiter) extracted from a corpus.",
            "type":"STRING",
            "name":"word",
            "mode":"REQUIRED"
         },
         {
            "description":"The number of times this word appears in this corpus.",
            "type":"INTEGER",
            "name":"word_count",
            "mode":"REQUIRED"
         },
         {
            "description":"The work from which this word was extracted.",
            "type":"STRING",
            "name":"corpus",
            "mode":"REQUIRED"
         },
         {
            "description":"The year in which this corpus was published.",
            "type":"INTEGER",
            "name":"corpus_date",
            "mode":"REQUIRED"
         }
      ]
   }
}

Answer 3

這會很有用

#standardSQL
with table1 as(
select "somename1" as name, "someaddress1" adrs union all
select "somename2" as name, "someaddress2" adrs union all
select "somename3" as name, "someaddress3" adrs
)
select  array_length(regexp_extract_all(to_json_string(table1),"\":"))total_columns from table1 limit 1

Answer 4

這是一個不需要 JQ 的替代方案，但有點“昂貴”;-)：

bq --format=csv query "select * FROM publicdata:samples.shakespeare LIMIT 1"|tail -n1|sed 's/[^,]//g' | wc -c

注意：我懷疑這是否適用於包含多個重復/嵌套列的表。

Answer 5

只需添加一個片段即可在 python 中獲取模式：

from gcloud import bigquery

client = bigquery.Client(project="project_id")
dataset = client.list_datasets()
flag=0
for ds in dataset[0]:
    if flag==1:
        break
    if ds.name==<<dataset_name>>:
        for table in ds.list_tables()[0]:
            if table.name==<<table_name>>:
                table.reload()
                no_columns = len(table.schema)
                flag=1
                break

no_columns 變量包含所需表的列長度。

Answer 6

在 node.js 中，我使用此代碼來獲取長度：

const { BigQuery } = require('@google-cloud/bigquery');

var params= {bq_project_id : "my_project_id"};//YOUR PROJECT ID
params.bq_dataset_id = "my_dataset_id"; //YOUR DATASET ID
params.bq_table_id = "my_table_id"; //YOUR TABLE ID
params.bq_keyFilename = './my_bq_key.json';//YOUR KEY PATH

const bigquery = new BigQuery({
    projectId: params.bq_project_id,
    keyFilename: params.bq_keyFilename,
});
async function colNums() {
    let resp = await bigquery.dataset(params.bq_dataset_id).table(params.bq_table_id).get();
    console.log(resp[1].schema.fields.length)
}
colNums();

我不確定“resp[1]”是否適用於所有人（如果有問題，請嘗試查看 resp 對象）

Answer 7

您現在可以使用INFORMATION_SCHEMA - 一系列視圖，提供對有關數據集、表和視圖的元數據的訪問

例如

SELECT * EXCEPT(is_generated, generation_expression, is_stored, is_updatable)
FROM `bigquery-public-data.hacker_news.INFORMATION_SCHEMA.COLUMNS`
WHERE table_name = 'stories'

當您需要 RECORD（或 STRUCT）列中的所有嵌套字段時， INFORMATION_SCHEMA.COLUMN_FIELD_PATHS視圖也很有用。

Answer 8

為 Google BigQuery 使用 Python 客戶端庫

from google.cloud import bigquery

bq_client = bigquery.Client.from_service_account_json("mypath\Service_Account_JSON_key_path")
table_id = "myproject.mydataset.mytable"
table = bq_client.get_table(table_id)  # API request.
print("The table {} has {} rows and {} columns".format(table_id, table.num_rows, len(table.schema)))

BigQuery - 獲取 BigQuery 表中的總列數

問題描述

8 個解決方案

解決方案1
7 2020-02-13 17:10:50

解決方案2
6 已采納 2015-05-21 07:30:21

解決方案3
1 2018-05-10 20:03:29

解決方案4
0 2015-05-21 19:31:56

解決方案5
0 2017-01-03 08:28:16

解決方案6
0 2019-09-26 20:19:43

解決方案7
0 2019-09-26 20:33:30

解決方案8
0 2023-02-02 17:26:58

為 Google BigQuery 使用 Python 客戶端庫

BigQuery - 獲取 BigQuery 表中的總列數

問題描述

8 個解決方案

解決方案1 7 2020-02-13 17:10:50

解決方案2 6 已采納 2015-05-21 07:30:21

解決方案3 1 2018-05-10 20:03:29

解決方案4 0 2015-05-21 19:31:56

解決方案5 0 2017-01-03 08:28:16

解決方案6 0 2019-09-26 20:19:43

解決方案7 0 2019-09-26 20:33:30

解決方案8 0 2023-02-02 17:26:58

為 Google BigQuery 使用 Python 客戶端庫

解決方案1
7 2020-02-13 17:10:50

解決方案2
6 已采納 2015-05-21 07:30:21

解決方案3
1 2018-05-10 20:03:29

解決方案4
0 2015-05-21 19:31:56

解決方案5
0 2017-01-03 08:28:16

解決方案6
0 2019-09-26 20:19:43

解決方案7
0 2019-09-26 20:33:30

解決方案8
0 2023-02-02 17:26:58