简体   繁体   中英

when exporting google bigquery to GCS results in multiple files of 0 bytes

I have the below EXPORT SQL command that runs successfully. However, it generates 22 files of 0 bytes. The SQL is correct. No data should return. That's not my problem. The issue lies in why does the export still results in 22 exported files in GCS? The expectation is that if there's no returns, no files should be created.

How do I stop that? Thank you.

EXPORT DATA OPTIONS (
  uri = 'gs://<<BUCKET>>/<<TABLE>>*.csv',
  format = 'CSV',
  overwrite = true,
  header = false,
  field_delimiter = '|'
) AS
SELECT DISTINCT * FROM `<<PROJECT>>.<<DATASET>>.VWE_<<TABLE>>` where cast(LASTLOADDATE as datetime) > DATETIME_SUB(CURRENT_DATE, INTERVAL 2 DAY) and LASTLOADDATE is not null;

Unfortunately this is a normal behaviour with BigQuery export using a wildcard in the uri.

BigQuery shards your data into multiple files based on the provided pattern. The size of the exported files will vary: doc

Even if there is no result in the query, with wildcard, BigQuery can generate multiple empty files.

If it's mandatory in your case to delete empty files, you can create a dedicated Shell script to remove them, example:

# check file size with
gsutil du -s -a gs://bucket/kitten.png

# remove files with 
gsutil rm gs://bucket/kitten.png

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM