![](/img/trans.png)
[英]Apache beam python to use multiple shared handler in one single pipeline
[英]Use apache beam arguments within the pipeline
從pipeline_options
獲取 arguments 的最佳實踐是什么?
虛擬代碼示例:
known_args, pipeline_args = parser.parse_known_args()
pipeline_options = PipelineOptions(pipeline_args)
with beam.Pipeline(options=pipeline_options) as pipeline:
# here I want to use project argument
# I can't do pipeline.options.project
# because warning is displayed
(
pipeline
| "Transformation 1" >> beam.Map(lambda x: known_args.pubsub_sub) # this is fine
| "Transformation 2" >> beam.Map(lambda x: pipeline.options.project) # this is not fine
)
如何使用管道(項目、區域等)所需的那些標准 arguments,而不是那些用戶定義的?
我認為最佳做法是使用如下選項,我保留了您的初始代碼:
class MyPipelineOptions(PipelineOptions):
@classmethod
def _add_argparse_args(cls, parser):
parser.add_argument("--project", help="Project", required=True)
parser.add_argument("--pubsub_sub", help="Pub Sub", required=True)
my_pipeline_options = PipelineOptions().view_as(MyPipelineOptions)
pipeline_options = PipelineOptions()
with beam.Pipeline(options=pipeline_options) as pipeline:
# here I want to use project argument
# I can't do pipeline.options.project
# because warning is displayed
(
pipeline
| "Transformation 1" >> beam.Map(lambda x: my_pipeline_options.pubsub_sub)
| "Transformation 2" >> beam.Map(lambda x: my_pipeline_options.project)
)
我認為對於像project
這樣的預定義選項,您必須將它們添加到MyPipelineOptions
class 才能在您的 Python 代碼中使用它。
無需通過管道 object 訪問options
,只需直接使用pipeline_options
。
感謝您的回答。
因此,您的答案 + Beam 文檔為我提供了選項管理的全貌。 總結一下:
argparse
構建簡單的解析器。known_args
和beam_args
對象。known_args
應該直接在我們的管道代碼中使用(例如known_args.pubsub_topic
)。beam_args
用於創建PipelineOptions
object,它將被傳遞到Pipeline()
object。project
、 streaming
等)已經定義的一些參數,我們不應該在我們的自定義解析器中覆蓋它 - 我們應該創建view_as()
object 並使用它顯式(如known_args
)。 下面舉兩個例子。project
參數。# "project" will go into beam_args because we didn't define it in our parser
known_args, beam_args = parser.parse_known_args()
pipeline_options = PipelineOptions(beam_args)
# this argument will be available in "gcp_args" because GoogleCloudOptions class
# defining "project" argument (you can check source code)
gcp_args = pipeline_options.view_as(GoogleCloudOptions)
# and next if we want to use it somewhere we should do:
gcp_args.project
streaming
參數。# "streaming" will go into beam_args because we didn't define it in our parser
known_args, beam_args = parser.parse_known_args()
pipeline_options = PipelineOptions(beam_args)
# this argument will be available in "std_args" because StandardOptions class
# defining "streaming" argument (you can check source code)
std_args = pipeline_options.view_as(StandardOptions)
# and next if we want to use it somewhere we should do:
std_args.streaming
所以實際上為了查看 Apache 光束已經定義了哪些 arguments 我們應該查看Github 源代碼
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.