简体   繁体   English

如何使用 PipelineOptions 告诉数据流“use_unsupported_python_version”?

[英]How can I tell Dataflow to "use_unsupported_python_version" with PipelineOptions?

I'm trying to use Google Dataflow to transfer data from one BigQuery table to another:我正在尝试使用 Google Dataflow 将数据从一个 BigQuery 表传输到另一个表:

import apache_beam as beam
from apache_beam.io.gcp.internal.clients import bigquery
from apache_beam.options.pipeline_options import PipelineOptions

import argparse

def parseArgs():
  parser = argparse.ArgumentParser()
  parser.add_argument(
    '--experiment',
    default='use_unsupported_python_version',
    help='This does not seem to do anything.')
  args, beam_args = parser.parse_known_args()
  return beam_args

def beamer(rows=[]):
  if len(rows) == 0:
    return

  project = 'myproject-474601'
  gcs_temp_location = 'gs://my_temp_bucket/tmp'
  gcs_staging_location = 'gs://my_temp_bucket/staging'

  table_spec = bigquery.TableReference(
    projectId=project,
    datasetId='mydataset',
    tableId='test')
  beam_options = PipelineOptions(
    parseArgs(), # This doesn't seem to work.
    project=project,
    runner='DataflowRunner',
    job_name='unique-job-name',
    temp_location=gcs_temp_location,
    staging_location=gcs_staging_location,
    use_unsupported_python_version=True, # This doesn't work either. :(
    experiment='use_unsupported_python_version' # This also doesn't work.
  )

  with beam.Pipeline(options=beam_options) as p:
    quotes = p | beam.Create(rows)

    quotes | beam.io.WriteToBigQuery(
    table_spec,
    # custom_gcs_temp_location = gcs_temp_location, # Not needed?
    method='FILE_LOADS',
    write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND,
    create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED)
  return

if __name__ == '__main__':
    beamer(rows=[{'id': 'ein', 'value': None, 'year': None, 'valueHistory': [{'year': 2021, 'amount': 900}]}])

But apparently Dataflow doesn't support my Python version because I'm getting this error:但显然 Dataflow 不支持我的 Python 版本,因为我收到此错误:

Exception: Dataflow runner currently supports Python versions ['3.6', '3.7', '3.8'], got 3.9.7 (default, Sep 16 2021, 08:50:36) 
[Clang 10.0.0 ].
To ignore this requirement and start a job using an unsupported version of Python interpreter, pass --experiment use_unsupported_python_version pipeline option.

So I tried adding a use_unsupported_python_version parameter to PipelineOptions to no avail.所以我尝试将use_unsupported_python_version参数添加到 PipelineOptions 无济于事。 I also tried an experiment option.我还尝试了一个experiment选项。 In the official pipeline option docs , it shows args being successfully merged into PipelineOptions, so I tried that too.官方管道选项文档中,它显示 args 已成功合并到 PipelineOptions 中,所以我也尝试了。

Yet I continue to get the same unsupported version error.然而,我继续得到同样的unsupported version错误。 How can I get Dataflow to use my version of Python?如何让 Dataflow 使用我的 Python 版本?

Try passing experiments=['use_unsupported_python_version'] .尝试传递experiments=['use_unsupported_python_version'] You can delete your implementation of parseArgs as well.您也可以删除parseArgs的实现。

'--experiment=use_unsupported_python_version' Please add option like above '--experiment=use_unsupported_python_version' 请像上面那样添加选项

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 我可以将 google DataFlow 与本机 python 一起使用吗? - Can I use google DataFlow with native python? 如果我们想使用 S3 来托管 Python 包,我们如何告诉 pip 在哪里可以找到最新版本? - If we want use S3 to host Python packages, how can we tell pip where to find the newest version? 在 PipelineOptions 中传递额外的 arguments 时,数据流管道工作人员会停止 - Dataflow Pipeline workers stall when passing extra arguments in PipelineOptions 如何使用 GCP 云 SQL 作为数据流源和/或接收器与 Python? - How to use GCP Cloud SQL as Dataflow source and/or sink with Python? 如何指定 GCP 数据流的 IP 号? - How can I specify the IP number of GCP's Dataflow? GCP Dataflow 中如何确定永久性磁盘的使用情况? - How is persistent disk use determined in GCP Dataflow? 我如何判断它是图像还是视频 - Firebase Storage/Firestore - How can I tell if it's an image or video - Firebase Storage/Firestore 我如何知道我的 Azure 应用服务正在运行的 IIS 和 Windows 是什么版本 - How do I tell what version of IIS and Windows my Azure App Service is running on 如何判断我的值是否已存在于 Firebase Firestore 中? - How can I tell if I value already exists in Firebase Firestore? 如何以编程方式取消运行时间过长的 Dataflow 作业? - How can I programmatically cancel a Dataflow job that has run for too long?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM