使用python将Google存储器中的csv.gz文件加载到bigquery

Question

I want to load csv.gz file from storage to bigquery. 我想将csv.gz文件从存储加载到bigquery。 Right now I using below code, but I am not sure if it is efficient way to load data to bigquery. 现在我使用下面的代码，但是我不确定将数据加载到bigquery是否有效。

# -*- coding: utf-8 -*-
from io import BytesIO
import pandas as pd
from google.cloud import storage
import pandas_gbq as gbq
client = storage.Client.from_service_account_json(service_account)
bucket = client.get_bucket("bucketname")
blob = storage.blob.Blob("""somefile.csv.gz""", bucket)
content = blob.download_as_string()
df = pd.read_csv(BytesIO(content), delimiter=',', quotechar='"', low_memory=False)
df = df.astype(str)
df.columns = df.columns.str.replace("|", "")
df["dateinsert"] = pd.datetime.now()
gbq.to_gbq(df, 'desttable',
           'projectid',
           chunksize=None,
           if_exists='append'
           )

Please assist me to write this code in efficient way 请协助我以有效的方式编写此代码

Answer 1

I propose you this process: 我建议您执行以下过程：

Perform a load job into bigquery 执行加载到bigquery中的作业
- Add the schema, yes 150 column is boring... 添加架构， 是的150列很无聊...
- Add skip leading row option for skipping the header job_config.skip_leading_rows = 1 添加跳过首行选项以跳过标题job_config.skip_leading_rows = 1
- Name your table like this <dataset>.<tableBaseName>_<Datetime> The date time must be a string format compliant with BigQuery table name. 像这样<dataset>.<tableBaseName>_<Datetime>一样命名表<dataset>.<tableBaseName>_<Datetime>日期时间必须是与BigQuery表名兼容的字符串格式。 For example YYYYMMDDHHMM 例如YYYYMMDDHHMM

When you query your data, you can query a subset of table, and inject the table name in the query result, like this: 查询数据时，可以查询表的子集，并将表名插入查询结果中，如下所示：

SELECT *,(SELECT table_id
      FROM `<project>.<dataset>.__TABLES_SUMMARY__`
      WHERE table_id LIKE '<tableBaseName>%') FROM `<project>.<dataset>.<tableBaseName>*`

Of course, you can raffine the * with the year, month, day,... 当然，您可以使用年，月，日，...来拼写*

I think, I meet all your requirements. 我想，我满足您的所有要求。 Comment if something goes wrong 如果出现问题请发表评论

使用python将Google存储器中的csv.gz文件加载到bigquery

问题描述

1 个解决方案

解决方案1
1 2019-09-24 13:22:20

使用python将Google存储器中的csv.gz文件加载到bigquery

问题描述

1 个解决方案

解决方案1 1 2019-09-24 13:22:20

解决方案1
1 2019-09-24 13:22:20