Partitioning a table in BigQuery by file

Question

I would like to create a table that is partitioned based on the filename. For example, let's say I have a thousand sales file, one for each date such as:

Files/Sales_2014-01-01.csv , Files/Sales_2014-01-02.csv , ...

I would like to partition the table based on the filename (which is essentially the date). Is there a way to do this in BQ? For example, I want to do a load job similar to the following (in pseudocode):

bq load gs://Files/Sales*.csv PARTITION BY filename

What would be the closest thing I could do to that?

Answer 1

When you have a TIMESTAMP, DATE, or DATETIME column in a table, first create a partitioned table by using the Time-unit column partitioning . When you load data to the table, BigQuery automatically puts the data into the correct partitions, based on the values in the column. To create an empty partitioned table for time-unit column-partitioned using bq CLI, please refer to the below command:

  bq mk -t \
  --schema 'ts:DATE,qtr:STRING,sales:FLOAT' \
  --time_partitioning_field ts \
  --time_partitioning_type DAILY \
  mydataset.mytable

Then load all your sales files into that Time-unit column partitioning table. It will automatically put the data into the correct partition. The following command loads data from multiple files in gs://mybucket/ into a table named mytable in mydataset. The schema would be auto detected. Please refer to this link for more information.

  bq load \
  --autodetect \
  --source_format=CSV \
  mydataset.mytable \
  gs://mybucket/mydata*.csv

Partitioning a table in BigQuery by file

Question

1 answers

solution1
0

Partitioning a table in BigQuery by file

Question

1 answers

solution1 0

solution1
0