简体   繁体   English

如何使用GCP Dataflow中的python管道代码读取BigQuery表

[英]How to read BigQuery table using python pipeline code in GCP Dataflow

有人可以分享语法来读取/写入在python中为GCP Dataflow编写的管道中的bigquery表

Run on Dataflow 在Dataflow上运行

First, construct a Pipeline with the following options for it to run on GCP DataFlow: 首先,使用以下选项构造一个Pipeline ,以便在GCP DataFlow上运行:

import apache_beam as beam

options = {'project': <project>,
           'runner': 'DataflowRunner',
           'region': <region>,
           'setup_file': <setup.py file>}
pipeline_options = beam.pipeline.PipelineOptions(flags=[], **options)
pipeline = beam.Pipeline(options = pipeline_options)

Read from BigQuery 从BigQuery读取

Define a BigQuerySource with your query and use beam.io.Read to read data from BQ: 使用您的查询定义BigQuerySource并使用beam.io.Read从BQ读取数据:

BQ_source = beam.io.BigQuerySource(query = <query>)
BQ_data = pipeline | beam.io.Read(BQ_source)

Write to BigQuery 写信给BigQuery

There are two options to write to bigquery: 写入bigquery有两种选择:

  • use a BigQuerySink and beam.io.Write : 使用BigQuerySinkbeam.io.Write

     BQ_sink = beam.io.BigQuerySink(<table>, dataset=<dataset>, project=<project>) BQ_data | beam.io.Write(BQ_sink) 
  • use beam.io.WriteToBigQuery : 使用beam.io.WriteToBigQuery

     BQ_data | beam.io.WriteToBigQuery(<table>, dataset=<dataset>, project=<project>) 

Reading from Bigquery 从Bigquery读书

rows = (p | 'ReadFromBQ' >> beam.io.Read(beam.io.BigQuerySource(query=QUERY, use_standard_sql=True))

writing to Bigquery 写信给Bigquery

rows | 'writeToBQ' >> beam.io.Write(
beam.io.BigQuerySink('{}:{}.{}'.format(PROJECT, BQ_DATASET_ID, BQ_TEST), schema='CONVERSATION:STRING, LEAD_ID:INTEGER', create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
    write_disposition=beam.io.BigQueryDisposition.WRITE_TRUNCATE))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用 Python 在 Dataflow 管道中处理 BigQuery 插入错误? - How to handle BigQuery insert errors in a Dataflow pipeline using Python? 在数据流管道中动态设置 bigquery 表 id - Dynamically set bigquery table id in dataflow pipeline 数据流:使用 python 管道更新 BigQuery 行 - Dataflow: update BigQuery rows with python pipeline 预计 ETA 在使用 python 的 apache beam GCP 数据流管道中使用管道 I/O 和运行时参数? - Expected ETA to avail Pipeline I/O and runtime parameters in apache beam GCP dataflow pipeline using python? 如何使用 Apache 光束(Python)在 Bigquery 中将列从一个表连接到另一个数据流 - How to join columns from one table to another in Bigquery using Apache beam (Python) for a dataflow 使用 Dataflow 管道 (python) 将多个 Json zip 文件从 GCS 加载到 BigQuery - Load multiple Json zip file from GCS to BigQuery using Dataflow pipeline (python) 在 GCP 数据流上使用 python apache 光束中的 scipy - Using scipy in python apache beam on GCP Dataflow 如何在同一管道中使用 Apache beam python 作业从 BigQuery 和文件系统读取数据? - How to read Data form BigQuery and File system using Apache beam python job in same pipeline? 使用python在没有数据流的情况下发布到bigquery - pubsub to bigquery without dataflow using python 谷歌 OAuth 2.0 使用 Python 用于 GCP BigQuery - Google OAuth 2.0 using Python for GCP BigQuery
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM