简体   繁体   English

在 Palantir Foundry 中进行测试

[英]Testing in Palantir Foundry

In Palantir Foundry, I could see that we can write unit tests using Pytest or TransformRunner.在 Palantir Foundry 中,我可以看到我们可以使用 Pytest 或 TransformRunner 编写单元测试。 My understanding is that, with Pytest we cannot pass an output of transform for unit testing and in TransformRunner we cannot use the dataset that we have to use originally.我的理解是,对于 Pytest,我们无法通过 output 进行单元测试,并且在 TransformRunner 中我们无法使用我们最初必须使用的数据集。 We need some test data.我们需要一些测试数据。 But I would like to use the whole input dataset on which my transform should run in production and do run tests on the output of it.但我想使用我的转换应该在生产中运行的整个输入数据集,并在它的 output 上运行测试。 How can I achieve that?我怎样才能做到这一点?

You can't access foundry datasets from the CI, you'll need to have the data snippet in a file within your repo and then load it.您无法从 CI 访问铸造数据集,您需要将数据片段保存在存储库中的文件中,然后加载它。

test/fixtures/data/input/a.csv测试/夹具/数据/输入/a.csv

col_a,col_b
1,2
TEST_DATA_DIR = os.path.join(os.path.dirname(__file__), '..', '..', 'fixtures', 'data')


def test_runner_single_table(spark_session):
    pipeline = Pipeline()

    @transform_df(Output('/test_single_table/output/test'),
                  input_a=Input('/test_single_table/input/a'))
    def transform_1(input_a):
        return input_a.withColumn('col_c', input_a['col_a'] + input_a['col_b'])

    pipeline.add_transforms(transform_1)

    runner = TransformRunner(pipeline, '/test_single_table', TEST_DATA_DIR)

    output = runner.build_dataset(spark_session, '/test_single_table/output/test')
    assert output.first()['col_c'] == 3

TransformsRunner will translate the Input path into the directory path. TransformsRunner会将Input路径转换为目录路径。 In the example above:在上面的例子中:

  • TEST_DATA_DIR tells the runner where the data is in your environment TEST_DATA_DIR告诉运行程序数据在您的环境中的位置
  • '/test_single_table' tells the runner what subpath can be ignored, since this path only exists on foundry datasets, not within your repo '/test_single_table'告诉运行程序可以忽略哪些子路径,因为此路径仅存在于代工厂数据集上,而不存在于您的存储库中
  • input/a will be resolved against the Input('[ignored_sub_path]/input/a') and folder structure you defined in your repo. input/a将根据您在 repo 中定义的Input('[ignored_sub_path]/input/a')和文件夹结构进行解析。

You can print this properties and it will show up in the CI checks, if you want to understand them better.您可以打印此属性,如果您想更好地理解它们,它将显示在 CI 检查中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM