简体   繁体   English

使用python将Google存储器中的csv.gz文件加载到bigquery

[英]Load csv.gz file from google storage to bigquery using python

I want to load csv.gz file from storage to bigquery. 我想将csv.gz文件从存储加载到bigquery。 Right now I using below code, but I am not sure if it is efficient way to load data to bigquery. 现在我使用下面的代码,但是我不确定将数据加载到bigquery是否有效。

# -*- coding: utf-8 -*-
from io import BytesIO
import pandas as pd
from google.cloud import storage
import pandas_gbq as gbq
client = storage.Client.from_service_account_json(service_account)
bucket = client.get_bucket("bucketname")
blob = storage.blob.Blob("""somefile.csv.gz""", bucket)
content = blob.download_as_string()
df = pd.read_csv(BytesIO(content), delimiter=',', quotechar='"', low_memory=False)
df = df.astype(str)
df.columns = df.columns.str.replace("|", "")
df["dateinsert"] = pd.datetime.now()
gbq.to_gbq(df, 'desttable',
           'projectid',
           chunksize=None,
           if_exists='append'
           )

Please assist me to write this code in efficient way 请协助我以有效的方式编写此代码

I propose you this process: 我建议您执行以下过程:

  • Perform a load job into bigquery 执行加载到bigquery中的作业
    • Add the schema, yes 150 column is boring... 添加架构, 是的150列很无聊...
    • Add skip leading row option for skipping the header job_config.skip_leading_rows = 1 添加跳过首行选项以跳过标题job_config.skip_leading_rows = 1
    • Name your table like this <dataset>.<tableBaseName>_<Datetime> The date time must be a string format compliant with BigQuery table name. 像这样<dataset>.<tableBaseName>_<Datetime>一样命名表<dataset>.<tableBaseName>_<Datetime>日期时间必须是与BigQuery表名兼容的字符串格式。 For example YYYYMMDDHHMM 例如YYYYMMDDHHMM

When you query your data, you can query a subset of table, and inject the table name in the query result, like this: 查询数据时,可以查询表的子集,并将表名插入查询结果中,如下所示:

SELECT *,(SELECT table_id
      FROM `<project>.<dataset>.__TABLES_SUMMARY__`
      WHERE table_id LIKE '<tableBaseName>%') FROM `<project>.<dataset>.<tableBaseName>*` 

Of course, you can raffine the * with the year, month, day,... 当然,您可以使用年,月,日,...来拼写*

I think, I meet all your requirements. 我想,我满足您的所有要求。 Comment if something goes wrong 如果出现问题请发表评论

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM