简体   繁体   中英

No module named 'IPython' on GCP DataflowRunner with Apache Beam

I have a fairly simple Apache Beam pipeline in Python I have set up in a Jupyter notebook and would like to deploy to a Dataflow runner. I am fairy new to all 3 of these. I am using the Python 3 and Apache Beam 2.27.0 kernel.

my pipeline options looks something like this:

options.view_as(GoogleCloudOptions).project = 'inspired-studio-11111'
options.view_as(GoogleCloudOptions).job_name = 'Dataflow Test Job2' + jobid
options.view_as(GoogleCloudOptions).region = 'us-central1'
options.view_as(GoogleCloudOptions).staging_location = 'gs://bucket/staging'
options.view_as(GoogleCloudOptions).temp_location = 'gs://bucket/temp'
options.view_as(SetupOptions).save_main_session = True

The pipeline runs fine in the notebook and interacts with GCP storage. When I set it up to run and run it on a GCP dataflow runner, I consistently get the following exception:

Error message from worker: Traceback (most recent call last): File "/usr/local/lib/python3.7/site-packages/dataflow_worker/batchworker.py", line 771, in run self._load_main_session(self.local_staging_directory) File "/usr/local/lib/python3.7/site-packages/dataflow_worker/batchworker.py", line 512, in _load_main_session pickler.load_session(session_file) File "/usr/local/lib/python3.7/site-packages/apache_beam/internal/pickler.py", line 318, in load_session return dill.load_session(file_path) File "/usr/local/lib/python3.7/site-packages/dill/_dill.py", line 368, in load_session module = unpickler.load() File "/usr/local/lib/python3.7/site-packages/dill/_dill.py", line 472, in load obj = StockUnpickler.load(self) File "/usr/local/lib/python3.7/site-packages/dill/_dill.py", line 462, in find_class return StockUnpickler.find_class(self, module, name) ModuleNotFoundError: No module named 'IPython'

Installing and importing ipython in my notebook did not help. Does this need to be configured somewhere on the GCP VM?

That error is usually caused by using the save_main_session=True option. See Handle nameerrors when launching Dataflow jobs with Apache Beam notebooks for a discussion on other ways of making sure the workers have the right code available at runtime.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM