简体   繁体   English

Google Cloud Dataflow:ModuleNotFoundError:运行集成测试时没有名为“main”的模块

[英]Google Cloud Dataflow: ModuleNotFoundError: No module named 'main' when running integration test

I have an apache beam pipeline that works fine in both local and cloud modes.我有一个 apache 光束管道,在本地和云模式下都可以正常工作。 However, I have an end to end integration tests that I'm running in every MR, and the IT is submitted to Dataflow.但是,我在每个 MR 中运行端到端集成测试,并将 IT 提交给 Dataflow。

This time, the IT is throwing the following error:这一次,IT 抛出以下错误:

_import_module return __import__(import_name) ModuleNotFoundError: No module named 'main'

The stacktrace is not pointing at all to the place where the module is not recognised.堆栈跟踪根本没有指向无法识别模块的地方。 Just the follwing:只是以下内容:

job-v2-test-20-08160911-vs73-harness-drt8
      Root cause: Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/apache_beam/internal/dill_pickler.py", line 285, in loads
    return dill.loads(s)
  File "/usr/local/lib/python3.8/site-packages/dill/_dill.py", line 275, in loads
    return load(file, ignore, **kwds)
  File "/usr/local/lib/python3.8/site-packages/dill/_dill.py", line 270, in load
    return Unpickler(file, ignore=ignore, **kwds).load()
  File "/usr/local/lib/python3.8/site-packages/dill/_dill.py", line 472, in load
    obj = StockUnpickler.load(self)
  File "/usr/local/lib/python3.8/site-packages/dill/_dill.py", line 826, in _import_module
    return __import__(import_name)
ModuleNotFoundError: No module named 'main'

The main module I use only in the IT file, and it doesn't exist in any transformation of the pipeline.我只在 IT 文件中使用的主模块,它在管道的任何转换中都不存在。 Also, when I run the IT, half of the pipeline transformation runs successfully until it hangs with the provided error此外,当我运行 IT 时,一半的管道转换成功运行,直到它因提供的错误而挂起

The IT code: IT代码:

from main import run
import argparse

import unittest
import logging


class PipelineIT(unittest.TestCase):

    def setUp(self):

        self.test_pipeline = TestPipeline(is_integration_test=True)

        parser = argparse.ArgumentParser()
        self.args, self.beam_args = parser.parse_known_args()
        self.pipeline_options = PipelineOptions(self.beam_args)
        self.client = get_bq_instance()
        self.tables_timestamp = datetime.now().strftime("%Y%m%d%H%M")

    def test_mc_end_to_end(self):

        state_verifier = PipelineStateMatcher(PipelineState.DONE)
        extra_opts = {
            'input': IT_BUCKET,
            'output_dataset': IT_DATASET,
            'output': IT_OUTPUT,
            'bq_timestamp': self.tables_timestamp,
            'on_success_matcher':
                all_of(state_verifier)
        }

        run(self.test_pipeline.get_full_options_as_args(**extra_opts), save_main_session=True)

# buch of asserts

THe command I'm using to run the IT我用来运行 IT 的命令

coverage run -m  pytest --log-cli-level=INFO integration_tests/end_to_end_it_test.py --job_name "end_to_end_it" --test-pipeline-options=" --run_mode=cloud --mode=test --setup_file=path_to_setup.py"

The pipeline works fine in the production mode, but in the testing mode it shows that error.管道在生产模式下工作正常,但在测试模式下显示该错误。 I'm just wondering if the main is used only to trigger the integration test from local, how can it breaks the pipeline with the error我只是想知道main是否仅用于从本地触发集成测试,它怎么会因错误而中断管道

After deep investigation, in my pipeline, I was using beam.Filter in the following way:经过深入调查,在我的管道中,我通过以下方式使用了beam.Filter

dropped_and_missing = all_recs | 'Filter Dropped and Missing recs' >> beam.Filter(lambda rec: rec['existing_status'] == 'Dropped' or rec['existing_status'] == 'Missing')

Replacing the code block with a PTransformation that is based on if conditions solved the issue.将代码块替换为基于条件if解决了问题的PTransformation

I don't know where the issue is, I tried to dig into the source code, checking if there is in main module in the Filter function, but it doesn't exist.我不知道问题出在哪里,我试图挖掘源代码,检查Filter function的主模块中是否存在,但它不存在。

Also what's suspicious is the error is occurred only when running the integration test from the command line.另外值得怀疑的是,仅在从命令行运行集成测试时才会发生错误。 Pipeline works fine with LocalRunner and DataflowRunner管道与LocalRunnerDataflowRunner配合得很好

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 ModuleNotFoundError:运行 GCP 数据流作业时没有名为“oracledb”的模块 - ModuleNotFoundError: No module named 'oracledb' when running GCP Dataflow jobs ModuleNotFoundError:没有名为“google.cloud.location”的模块 - ModuleNotFoundError: No module named 'google.cloud.location' ModuleNotFoundError:将 Streamlit App 部署到 Heroku 时没有名为“google.cloud”的模块 - ModuleNotFoundError: No module named 'google.cloud' When deploying Streamlit App to Heroku ModuleNotFoundError:没有名为“google.rpc.context”的模块 - ModuleNotFoundError: No module named 'google.rpc.context' (云 Function web UI)ModuleNotFoundError:没有名为“googleapiclient”的模块 - (Cloud Function web UI) ModuleNotFoundError: No module named 'googleapiclient' ModuleNotFoundError:没有名为“pystan”的模块 - ModuleNotFoundError: No module named 'pystan' ModuleNotFoundError:尝试使用 python 连接器连接到 cloudsql 实例时没有名为“google”的模块 - ModuleNotFoundError: No module named 'google' when trying to connect to cloudsql instance using python connector ModuleNotFoundError:没有名为“sagemaker”的模块 - ModuleNotFoundError: No module named 'sagemaker' ModuleNotFoundError:没有名为“azure”的模块 - ModuleNotFoundError: No module named 'azure' ModuleNotFoundError:将 Django 应用程序部署到 AWS EB 时没有名为“application”的模块 - ModuleNotFoundError: No module named 'application' when deploying Django app to AWS EB
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM