[英]Dataflow reading from PubSub works at GCP, can't run locally
我有一個小型測試數據流作業,它只是從 PubSub 訂閱中讀取並丟棄我們用來開始一些概念驗證工作的消息。
它在 GCP 上運行良好,但在本地失敗。 我的期望是相同的代碼應該以任何一種方式工作,只需切換 Dataflow 運行器,但也許情況並非如此? 這是代碼:
import os
from datetime import datetime
import logging
from apache_beam import Map, io, Pipeline
from apache_beam.options.pipeline_options import PipelineOptions
def noop(element):
pass
def run(input_subscription, pipeline_args=None):
pipeline_options = PipelineOptions(
pipeline_args, streaming=True, save_main_session=True
)
with Pipeline(options=pipeline_options) as pipeline:
(
pipeline
| "Read from Pub/Sub" >> io.ReadFromPubSub(subscription=input_subscription, with_attributes=True)
| "noop" >> Map(noop)
)
if __name__ == "__main__":
logging.getLogger().setLevel(logging.INFO)
run(
os.environ['INPUT_SUBSCRIPTION'],
[
'--runner', os.getenv('RUNNER', 'DirectRunner'),
'--project', os.getenv('PROJECT'),
'--region', os.getenv('REGION'),
'--temp_location', os.getenv('TEMP_LOCATION'),
'--service_account_email', os.getenv('SERVICE_ACCOUNT_EMAIL'),
'--network', os.getenv('NETWORK'),
'--subnetwork', os.getenv('SUBNETWORK'),
'--num_workers', os.getenv('NUM_WORKERS'),
]
)
如果我使用此命令行運行它,它會在 Google Cloud 中創建並運行該作業:
INPUT_SUBSCRIPTION=subscriptionname \
RUNNER=DataflowRunner \
PROJECT=project \
REGION=region \
TEMP_LOCATION=gs://somewhere/temp \
SERVICE_ACCOUNT_EMAIL=serviceaccount@project.iam.gserviceaccount.com \
NETWORK=network \
SUBNETWORK=https://www.googleapis.com/compute/v1/projects/project/regions/region/subnetworks/subnetwork \
NUM_WORKERS=3 \
python read-pubsub-with-dataflow.py
如果我省略RUNNER
選項,那么它使用DirectRunner
:
INPUT_SUBSCRIPTION=subscriptionname \
PROJECT=project \
REGION=region \
TEMP_LOCATION=gs://somewhere/temp \
SERVICE_ACCOUNT_EMAIL=serviceaccount@project.iam.gserviceaccount.com \
NETWORK=network \
SUBNETWORK=https://www.googleapis.com/compute/v1/projects/project/regions/region/subnetworks/subnetwork \
NUM_WORKERS=3 \
python read-pubsub-with-dataflow.py
它失敗並出現大量錯誤消息,但我只包括第一個(我認為 rest 只是級聯):
INFO:apache_beam.runners.direct.direct_runner:Running pipeline with DirectRunner.
/Users/denis/redacted/env/lib/python3.6/site-packages/google/auth/_default.py:70: UserWarning: Your application has authenticated using end user credentials from Google Cloud SDK without a quota project. You might receive a "quota exceeded" or "API not enabled" error. We recommend you rerun `gcloud auth application-default login` and make sure a quota project is added. Or you can use service accounts instead. For more information about service accounts, see https://cloud.google.com/docs/authentication/
warnings.warn(_CLOUD_SDK_CREDENTIALS_WARNING)
ERROR:apache_beam.runners.direct.executor:Exception at bundle <apache_beam.runners.direct.bundle_factory._Bundle object at 0x7fed3e368448>, due to an exception.
Traceback (most recent call last):
File "/Users/denis/redacted/env/lib/python3.6/site-packages/apache_beam/runners/direct/transform_evaluator.py", line 694, in _read_from_pubsub
self._sub_name, max_messages=10, return_immediately=True)
File "/Users/denis/redacted/env/lib/python3.6/site-packages/google/cloud/pubsub_v1/_gapic.py", line 40, in <lambda>
fx = lambda self, *a, **kw: wrapped_fx(self.api, *a, **kw) # noqa
File "/Users/denis/redacted/env/lib/python3.6/site-packages/google/pubsub_v1/services/subscriber/client.py", line 1106, in pull
"If the `request` argument is set, then none of "
ValueError: If the `request` argument is set, then none of the individual field arguments should be set.
During handling of the above exception, another exception occurred:
...etc...
我懷疑這可能與憑據有關? 還是我們的項目配置? 也許我應該嘗試一個新的空白項目。
事實證明,這與 package 版本不兼容。 我的requirements.txt
是:
apache_beam[gcp]
google_apitools
google-cloud-pubsub
但那是安裝破壞apache_beam
的google-cloud-pubsub
package 版本。 我將requirements.txt
更改為:
apache_beam[gcp]
google_apitools
現在一切正常!
對於它的價值,使用DirectRunner
在本地運行我顯然不需要DataflowRunner
所需的很多選項。 這足夠了:
GOOGLE_APPLICATION_CREDENTIALS=/path/to/credentials.json \
RUNNER=DirectRunner \
INPUT_SUBSCRIPTION=projects/mytopic/subscriptions/mysubscription \
python read-pubsub-with-dataflow.py
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.