簡體   English   中英

管道內使用apache梁arguments

[英]Use apache beam arguments within the pipeline

pipeline_options獲取 arguments 的最佳實踐是什么?

虛擬代碼示例:

known_args, pipeline_args = parser.parse_known_args()
pipeline_options = PipelineOptions(pipeline_args)


with beam.Pipeline(options=pipeline_options) as pipeline:
    # here I want to use project argument
    # I can't do pipeline.options.project 
    # because warning is displayed
    (
      pipeline
      | "Transformation 1" >> beam.Map(lambda x: known_args.pubsub_sub)   # this is fine

      | "Transformation 2" >> beam.Map(lambda x: pipeline.options.project)    # this is not fine
    )

如何使用管道(項目、區域等)所需的那些標准 arguments,而不是那些用戶定義的?

我認為最佳做法是使用如下選項,我保留了您的初始代碼:

class MyPipelineOptions(PipelineOptions):

    @classmethod
    def _add_argparse_args(cls, parser):
        parser.add_argument("--project", help="Project", required=True)
        parser.add_argument("--pubsub_sub", help="Pub Sub", required=True)
        
my_pipeline_options = PipelineOptions().view_as(MyPipelineOptions)
pipeline_options = PipelineOptions()

with beam.Pipeline(options=pipeline_options) as pipeline:
    # here I want to use project argument
    # I can't do pipeline.options.project 
    # because warning is displayed
    (
      pipeline
      | "Transformation 1" >> beam.Map(lambda x: my_pipeline_options.pubsub_sub)   

      | "Transformation 2" >> beam.Map(lambda x: my_pipeline_options.project)    
    )

我認為對於像project這樣的預定義選項,您必須將它們添加到MyPipelineOptions class 才能在您的 Python 代碼中使用它。

無需通過管道 object 訪問options ,只需直接使用pipeline_options

感謝您的回答。

因此,您的答案 + Beam 文檔為我提供了選項管理的全貌。 總結一下:

  1. 如果我們需要定義一些額外的自定義 arguments(PubSub 主題或其他),我們使用argparse構建簡單的解析器。
  2. 從解析器創建known_argsbeam_args對象。
  3. known_args應該直接在我們的管道代碼中使用(例如known_args.pubsub_topic )。
  4. beam_args用於創建PipelineOptions object,它將被傳遞到Pipeline() object。
  5. [對我來說最重要的]如果我們需要明確使用 Apache Beam( projectstreaming等)已經定義的一些參數,我們不應該在我們的自定義解析器中覆蓋它 - 我們應該創建view_as() object 並使用它顯式(如known_args )。 下面舉兩個例子。
  6. 我們想使用 GCP project參數。
# "project" will go into beam_args because we didn't define it in our parser
known_args, beam_args = parser.parse_known_args()
pipeline_options = PipelineOptions(beam_args)

# this argument will be available in "gcp_args" because GoogleCloudOptions class
# defining "project" argument (you can check source code)
gcp_args = pipeline_options.view_as(GoogleCloudOptions)

# and next if we want to use it somewhere we should do:
gcp_args.project
  1. 我們想使用streaming參數。
# "streaming" will go into beam_args because we didn't define it in our parser
known_args, beam_args = parser.parse_known_args()
pipeline_options = PipelineOptions(beam_args)

# this argument will be available in "std_args" because StandardOptions class
# defining "streaming" argument (you can check source code)
std_args = pipeline_options.view_as(StandardOptions)

# and next if we want to use it somewhere we should do:
std_args.streaming

所以實際上為了查看 Apache 光束已經定義了哪些 arguments 我們應該查看Github 源代碼

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM