[英]GCP Bigquery Error in Load Operation: Bytes are Missing
I am very new to Google Cloud Platform and I'm trying to create a table in bigquery from ~60,000 csv.gz
files stored in a GCP bucket.我对 Google Cloud Platform 非常
csv.gz
,我正在尝试从存储在 GCP 存储桶中的约 60,000 个csv.gz
文件在 bigquery 中创建一个表。
To do this, I've opened Cloud Shell, and I'm trying the following:为此,我打开了 Cloud Shell,我正在尝试以下操作:
$ bq --location=US mk my_data
$ bq --location=US \
load --null_marker='' \
--source_format=CSV --autodetect \
my_data.my_table gs://my_bucket/*.csv.gz
This throws the following error:这会引发以下错误:
BigQuery error in load operation: Error processing job 'my_job:bqjob_r3eede45779dc9a51_0000017529110a63_1':
Error while reading data, error message:
FAILED_PRECONDITION: Invalid gzip file: bytes are missing
I don't know how to find which file might be problematic when loading the files.我不知道如何在加载文件时找到哪个文件可能有问题。 I've checked a few of the files, and they are all valid
.gz
files that I can open with any csv reader after decompression, but I don't know how to check through all the files to find a problematic one.我检查了一些文件,它们都是有效的
.gz
文件,解压后我可以用任何 csv 阅读器打开这些文件,但我不知道如何检查所有文件以找到有问题的文件。
Thank you in advance for any help with this!提前感谢您对此的任何帮助!
To loop through your bucket, you can use the eval command要遍历您的存储桶,您可以使用 eval 命令
#!/bin/bash
FILES="gsutil ls gs://YOUR_BUCKET"
RESULTS=$(eval $FILES)
for f in $RESULTS
do
read="gsutil cat $f | zcat | wc -c"
if [[ $(eval $read) == "0" ]]
then
#<Process it, Print name or Delete from bucket like below>
delete="gsutil rm $f"
eval $delete
fi
done
Another option is to download all your files locally , if possible, and process from there:如果可能,另一种选择是在本地下载所有文件,然后从那里进行处理:
gsutil -m cp -R gs://YOUR_BUCKET .
There might be .gz files that do not contain any data within.可能存在不包含任何数据的 .gz 文件。 You might want to write a script which will filter if the .gz files are valid.
您可能想要编写一个脚本来过滤 .gz 文件是否有效。
This sample bash script will do a directory loop through the .gz files and delete them if they are empty.此示例 bash 脚本将对 .gz 文件执行目录循环,如果它们为空,则将其删除。
for f in dir/*
do
if [[ $(gunzip -c $f | head -c1 | wc -c) == "0" ]]
then
do_file_creation
fi
done
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.