简体   繁体   English

使用 python 将 BigQuery 表数据导出到具有 where 子句的 Google Cloud Storage

[英]Exporting BigQuery Table Data to Google Cloud Storage having where clause using python

I want to export table data from BigQuery to Google Cloud Storage.我想将表数据从 BigQuery 导出到 Google Cloud Storage。 Problem is, I need data from date1 to date2 and not whole table data.问题是,我需要从 date1 到 date2 的数据,而不是整个表数据。

extract_job = client.extract_table(
    table_ref,
    destination_uri,
    # Location must match that of the source table.
    location='US')  # API request
extract_job.result()  

This is what I have found on the google cloud help.这是我在谷歌云帮助中找到的。 There is no space for adding query or limiting data using where clause.没有空间使用 where 子句添加查询或限制数据。

Unfortunately it will be two step process.不幸的是,这将是两步过程。 First you need to build result table and after export result.首先你需要建立结果表,然后导出结果。 From cost perspective impact should be minimal - you will pay for storage used by temp table with result but cost is $0.02 per GB per month - so if you manage to finish you task in 1 hour - cost will be $0.000027 per GB从成本角度来看,影响应该是最小的 - 您将支付临时表使用的存储费用,但成本为每月每 GB 0.02 美元 - 因此,如果您设法在 1 小时内完成任务 - 成本将为每 GB 0.000027 美元

job_config = bigquery.QueryJobConfig()
gcs_filename = 'file_*.gzip'

table_ref = client.dataset(dataset_id).table('my_temp_table')
job_config.destination = table_ref

job_config.write_disposition = bigquery.WriteDisposition.WRITE_TRUNCATE

# Start the query, passing in the extra configuration.
query_job = client.query(
    """#standardSql
    select * from `project.dataset.table` where <your_condition> ;""",
    location='US',
    job_config=job_config)

while not query_job.done():
    time.sleep(1)

#check if table successfully written
print("query completed")
job_config = bigquery.ExtractJobConfig()
job_config.compression = bigquery.Compression.GZIP
job_config.destination_format = (
    bigquery.DestinationFormat.CSV)
job_config.print_header = False

destination_uri = 'gs://{}/{}'.format(bucket_name, gcs_filename)

extract_job = client.extract_table(
    table_ref,
    destination_uri,
    job_config=job_config,
    location='US')  # API request
extract_job.result()
print("extract completed")

Using the code you provided (following this doc ), you can only export the whole table to GCS, not the result of a query.使用您提供的代码(遵循此文档),您只能将整个表导出到 GCS,而不是查询的结果。

Alternatively, you can download and save your query result to a local file and upload it to GCS.或者,您可以下载查询结果并将其保存到本地文件,然后将其上传到 GCS。 Or even easier, save the query result to a new BigQuery table and export that new table entirely to GCS with the code you used.或者更简单,将查询结果保存到一个新的 BigQuery 表,然后使用您使用的代码将该新表完全导出到 GCS。

Solution: Exporting BigQuery Data to Google Cloud Storage having where clause using python解决方案:使用 python 将 BigQuery 数据导出到具有 where 子句的 Google Cloud Storage

from google.cloud import bigquery
from google.cloud import storage

def export_to_gcs():
    QUERY = "SELECT * FROM TABLE where CONDITION" # change the table and where condition
    bq_client = bigquery.Client()
    query_job = bq_client.query(QUERY) # BigQuery API request
    rows_df = query_job.result().to_dataframe()
    
    storage_client = storage.Client() # Storage API request
    bucket = storage_client.get_bucket(BUCKETNAME) # change the bucket name
    blob = bucket.blob('temp/Add_to_Cart.csv')
    blob.upload_from_string(rows_df.to_csv(sep=';',index=False,encoding='utf-8'),content_type='application/octet-stream')
    return "success"

Use 'EXPORT DATA OPTIONS' command in native BigQuery SQL to export data from a SQL query.在原生 BigQuery SQL 中使用“EXPORT DATA OPTIONS”命令从 SQL 查询中导出数据。

Use a python client to submit SQL to BigQuery which will take care of the rest.使用 python 客户端将 SQL 提交给 BigQuery,BigQuery 将负责其余的工作。

from google.cloud import bigquery
from google.cloud import storage

BQ = bigquery.Client()
CS = storage.Client()

def gcp_export_http(request):

    sql = """
    EXPORT DATA OPTIONS(uri="gs://gcs-bucket/*",format='PARQUET',
    compression='SNAPPY') AS SELECT * FROM 
    table_name where column_name > colunn_value
    """

    query_job = BQ.query(sql)  
    res = query_job.result() 
return res

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用python将历史数据从Google云存储移至按日期划分的bigquery表 - Moving historical data from google cloud storage to date-partitioned bigquery table using python 将数据从Google Cloud Storage上的本地文件加载到BigQuery表 - Load data from local file on Google Cloud Storage to BigQuery table 使用Python将文件从Google云端存储上传到Bigquery - Uploading a file from Google Cloud Storage to Bigquery using Python 不使用 Google Cloud Storage 将 BigQuery 数据导出为 CSV - Export BigQuery Data to CSV without using Google Cloud Storage 使用python将嵌套的BigQuery数据导出到云存储 - Export nested BigQuery data to cloud storage using python 如何使用 Python 将 Cloud Storage 数据加载到 Bigquery? - How can I load Cloud Storage data into Bigquery using Python? python 通过使用全局变量的 where 子句从 bigquery 查询数据 - python query data from bigquery with where clause using a global variable 使用Google-Cloud-Python库将Stackdriver日志导出到BigQuery时出现访问问题 - Access issue with exporting Stackdriver logging to BigQuery using google-cloud-python library 获取禁止:403 访问被拒绝,当请求使用 python 将数据从谷歌云存储传输到 bigquery 时 - get Forbidden: 403 Access Denied when do request to transfer data from google cloud storage to bigquery using python 使用 Cloud Function 将数据加载到 Google Cloud Storage 和 BigQuery - Loading Data to Google Cloud Storage & BigQuery with Cloud Function
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM