简体   繁体   English

将数据从 bigquery 导出到 csv

[英]exporting data from bigquery to csv

i have a record set of <100k rows, big query wont let me download this to my computer in csv format, claiming its very big and i have to store it first - is there any work around?我有一个 <100k 行的记录集,大查询不会让我以 csv 格式将其下载到我的计算机上,声称它非常大,我必须先存储它 - 有什么解决方法吗? I want to load my output into R and the easiest way is using csv我想将我的 output 加载到 R 中,最简单的方法是使用 csv

You need to use Google Cloud Storage for your export job. 您需要将Google Cloud Storage用于导出作业。 Exporting data from BigQuery is explained here , check also the variants for different path syntaxes. 此处说明从BigQuery导出数据的方法,还请检查不同路径语法的变体。

Then you can download the files from GCS to your local storage. 然后,您可以将文件从GCS下载到本地存储中。 There is no way to directly download from BigQuery large data directly to your local computer. 无法直接从BigQuery直接将大数据下载到本地计算机。 You need to do via GCS. 您需要通过GCS进行操作。

I'm using the following python script for this task, it can handle large datasets without loading them into the memory.我为此任务使用了以下 python 脚本,它可以处理大型数据集,而无需将它们加载到 memory 中。

Make sure to install the dependencies and change the variables:确保安装依赖项并更改变量:

pip install google.cloud google-cloud-bigquery

Change the variables, the query, project, output file, and the file encoding (if required) to fit your needs更改变量、查询、项目、output 文件和文件编码(如果需要)以满足您的需要

from google.cloud import bigquery
import codecs
import csv

# Output file
output_file = "output.csv"

# GCP project
project="<some-project>"

# File encoding - utf-8-sig codec will remove BOM if present and support excel 
file_encoding="utf-8-sig"

# The query to execute
query = """
SELECT * from my-table
"""

client = bigquery.Client(project=project)
query_job = client.query(query)
result = query_job.result()
schema = result.schema


with codecs.open(output_file,"w",encoding=file_encoding) as f:
    writer = csv.writer(f)
    # Write headers
    header = [f_name.name for f_name in schema ]
    writer.writerow(header)
    # Write data to file
    for row in query_job:
        writer.writerow(row)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM