[英]How to unzip and load tsv file into Bigquery from gcs bucket
Below is the code to get the tsv.gz file from gcs and unzip the file and converting into comma separated csv file to load csv data into Bigquery.下面是从 gcs 获取tsv.gz文件并解压缩文件并将其转换为逗号分隔的 csv 文件以将 csv 数据加载到 Bigquery 的代码。
storage_client = storage.Client(project=project_id)
blobs_list = list(storage_client.list_blobs(bucket_name))
for blobs in blobs_list:
if blobs.name.endswith(".tsv.gz"):
source_file = blobs.name
uri = "gs://{}/{}".format(bucket_name, source_file)
gcs_file_system = gcsfs.GCSFileSystem(project=project_id)
with gcs_file_system.open(uri) as f:
gzf = gzip.GzipFile(mode="rb", fileobj=f)
csv_table=pd.read_table(gzf)
csv_table.to_csv('GfG.csv',index=False)
Code seems not effective to load data into BQ as getting many issues.代码似乎无法有效地将数据加载到 BQ 中,因为出现了很多问题。 Thought doing wrong with the conversion of file.以为文件的转换做错了。 Please put you thoughts where it went wrong?请把你的想法放在哪里出了问题?
If your file is gzip (not zip, I mean gzip), and in Cloud Storage, don't load it, unzip it and stream load it.如果您的文件是 gzip(不是 zip,我的意思是 gzip),并且在 Cloud Storage 中,请不要加载它,解压缩并流式加载它。
You can directly load, as is, in BigQuery, it's magic!!您可以直接在 BigQuery 中按原样加载,这太神奇了!! Here a sample这是一个样本
from google.cloud import bigquery
# Construct a BigQuery client object.
client = bigquery.Client()
# TODO(developer): Set table_id to the ID of the table to create.
# table_id = "your-project.your_dataset.your_table_name"
job_config = bigquery.LoadJobConfig(
autodetect=True, #Automatic schema
field_delimiter=",", # Use \t if your separator is tab in your TSV file
skip_leading_rows=1, #Skip the header values(but keep it for the column naming)
# The source format defaults to CSV, so the line below is optional.
source_format=bigquery.SourceFormat.CSV,
)
uri = "gs://{}/{}".format(bucket_name, source_file)
load_job = client.load_table_from_uri(
uri, table_id, job_config=job_config
) # Make an API request.
load_job.result() # Waits for the job to complete.
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.