简体   繁体   中英

Partitioning a table in BigQuery by file

I would like to create a table that is partitioned based on the filename. For example, let's say I have a thousand sales file, one for each date such as:

  • Files/Sales_2014-01-01.csv , Files/Sales_2014-01-02.csv , ...

I would like to partition the table based on the filename (which is essentially the date). Is there a way to do this in BQ? For example, I want to do a load job similar to the following (in pseudocode):

bq load gs://Files/Sales*.csv PARTITION BY filename

What would be the closest thing I could do to that?

When you have a TIMESTAMP, DATE, or DATETIME column in a table, first create a partitioned table by using the Time-unit column partitioning . When you load data to the table, BigQuery automatically puts the data into the correct partitions, based on the values in the column. To create an empty partitioned table for time-unit column-partitioned using bq CLI, please refer to the below command:

  bq mk -t \
  --schema 'ts:DATE,qtr:STRING,sales:FLOAT' \
  --time_partitioning_field ts \
  --time_partitioning_type DAILY \
  mydataset.mytable

Then load all your sales files into that Time-unit column partitioning table. It will automatically put the data into the correct partition. The following command loads data from multiple files in gs://mybucket/ into a table named mytable in mydataset. The schema would be auto detected. Please refer to this link for more information.

  bq load \
  --autodetect \
  --source_format=CSV \
  mydataset.mytable \
  gs://mybucket/mydata*.csv

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM