简体   繁体   English

跨越 Apache Beam / Dataflow 中多个文件的管道代码

[英]Pipeline code spanning multiple files in Apache Beam / Dataflow

After a lengthy search, I haven't found an example of a Dataflow / Beam pipeline that spans several files.经过长时间的搜索,我还没有找到跨越多个文件的 Dataflow/Beam 管道示例。 Beam docs do suggest a file structure (under the section "Multiple File Dependencies"), but the Juliaset example they give has in effect a single code/source file (and the main file that calls it). Beam 文档确实建议了一个文件结构(在“多个文件依赖项”部分下),但他们提供的 Juliaset 示例实际上只有一个代码/源文件(以及调用它的主文件)。 Based on the Juliaset example, I need a similar file structure:基于 Juliaset 示例,我需要一个类似的文件结构:

juliaset/__init__.py
juliaset/juliaset.py # actual code
juliaset/some_conf.py
__init__.py
juliaset_main.py
setup.py

Now I want to import .some_conf from juliaset/juliaset.py , which works when run locally but gives me an error when run on Dataflow现在我想从juliaset/juliaset.py import .some_conf ,它在本地运行时有效,但在 Dataflow 上运行时给我一个错误

INFO:root:2017-12-15T17:34:09.333Z: JOB_MESSAGE_ERROR: (8cdf3e226105b90a): Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 706, in run
    self._load_main_session(self.local_staging_directory)
  File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 446, in _load_main_session
    pickler.load_session(session_file)
  File "/usr/local/lib/python2.7/dist-packages/apache_beam/internal/pickler.py", line 247, in load_session
    return dill.load_session(file_path)
  File "/usr/local/lib/python2.7/dist-packages/dill/dill.py", line 363, in load_session
    module = unpickler.load()
  File "/usr/lib/python2.7/pickle.py", line 858, in load
    dispatch[key](self)
  File "/usr/lib/python2.7/pickle.py", line 1133, in load_reduce
    value = func(*args)
  File "/usr/local/lib/python2.7/dist-packages/dill/dill.py", line 767, in _import_module
    return getattr(__import__(module, None, None, [obj]), obj)
ImportError: No module named package_name.juliaset.some_conf

A full working example would be very much appreciated!一个完整的工作示例将不胜感激!

Can you verify your setup.py containing a structure like:您能否验证包含以下结构的setup.py

import setuptools

setuptools.setup(
    name='My Project',
    version='1.0',
    install_requires=[],
    packages=setuptools.find_packages(),
)

Import your modules like from juliaset.juliaset import SomeClassfrom juliaset.juliaset import SomeClass一样导入你的模块

And when you call the Python script, use python -m juliaset_main (without the .py)当您调用 Python 脚本时,请使用python -m juliaset_main (不带 .py)

Not sure if you already tried this, but just to be sure.不确定你是否已经尝试过这个,但只是为了确定。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Apache Beam 数据流管道使用 Bazel 构建和部署 - Apache Beam Dataflow pipeline build and deploy with Bazel apache 光束数据流管道(python)中的步骤的 If 语句 - If statement for steps in a apache beam dataflow pipeline (python) Apache Beam 数据流管道 - 具有高墙时间的简单 DoFn - Apache Beam Dataflow pipeline - Simple DoFn with high Wall time Apache 光束侧输入在流数据流管道中不工作 Python SDK - Apache Beam Side Inputs not working in Streaming Dataflow Pipeline with Python SDK 在数据流上运行Apache Beam管道会触发错误(DirectRunner在没有问题的情况下运行) - Running Apache Beam pipeline on dataflow fires an error (DirectRunner running with no issue) Apache Beam Pipeline 从 REST API 在本地运行,但不在 Dataflow 上运行 - Apache Beam Pipeline to read from REST API runs locally but not on Dataflow 如何在 Apache Beam / Google Cloud DataFlow 中处理多个 ParDo 转换对本地文件的操作 - How to handle operations on local files over multiple ParDo transforms in Apache Beam / Google Cloud DataFlow 如何从谷歌数据流 apache 光束 python 中的 GCS 存储桶中读取多个 JSON 文件 - How to read multiple JSON files from GCS bucket in google dataflow apache beam python Apache Beam 的 FileBasedSource 在 Google Dataflow 上运行管道超过 GCS 上约 240 万个文件时出现令人困惑的错误 - Confusing error on Apache Beam's FileBasedSource when running pipeline on Google Dataflow over ~2.4M files on GCS GCP Dataflow Apache Beam 代码逻辑未按预期工作 - GCP Dataflow Apache Beam code logic not working as expected
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM