简体   繁体   English

在 Palantir Foundry 代码库中创建模式数据健康预期

[英]Creating a schema data health expectation in Palantir Foundry Code Repositories

I have a dataset that is the output of a Python transform defined in Palantir Foundry Code Repository.我有一个数据集,它是 Palantir Foundry 代码存储库中定义的 Python 转换的输出。 It has certain columns, but given that over time the data may change I want to validate these columns(around 73) holds in the future.它有某些列,但考虑到随着时间的推移数据可能会发生变化,我想在未来验证这些列(大约 73 个)是否成立。

How can I create a data health expectation or check to ensure that all 73 columns holds in future?如何创建数据健康预期或检查以确保将来所有 73 列都成立?

You can use expectations to make assertions about which columns exist in your output schema.您可以使用期望来断言输出模式中存在哪些列。

See the official docs for schema expectations .有关架构期望,请参阅官方文档

There are 3 kinds of schema expectations:有 3 种模式期望:

# Assert some columns exist.
E.schema().contains({'col1': type1, 'col2': type2})

# Assert the schema contains only columns from the given set (but not necessarily all of them).
E.schema().is_subset_of({'col1': type1, 'col2': type2})

# Assert the schema contains exactly the given columns.
E.schema().equals({'col1': type1, 'col2': type2})

Additionally, for checking a single column, you can use E.col('col1').exists() .此外,要检查单个列,您可以使用E.col('col1').exists() But for 73 columns you're better off going with E.schema() .但是对于 73 列,您最好使用E.schema()

So for a more fleshed-out example, you might have something like:因此,对于一个更充实的示例,您可能会有类似的内容:

from transforms.api import transform_df, Check, Input, Output
import transforms.expectations as E
from pyspark.sql import types as T

COLUMNS_WHICH_MUST_EXIST = {
    'string_column': T.StringType(),
    'number_column': T.IntegerType(),
    # ...and 71 more.
}

@transform_df(
    Output("ri.foundry.main.dataset.abcdef", checks=[
        Check(E.schema().contains(COLUMNS_WHICH_MUST_EXIST), "contains important columns"),
    ]),
    input_data=Input("ri.foundry.main.dataset.12345678"),
)
def compute(input_data):
    # ... your logic here

Also see the official docs for expectation checks for more details of the options available.有关可用选项的更多详细信息,另请参阅期望检查的官方文档

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM