简体   繁体   English

在 Palantir Foundry 代码库中创建主键数据健康预期

[英]Creating a primary key data health expectation in Palantir Foundry Code Repositories

I have a dataset that is the output of a Python transform defined in Palantir Foundry Code Repository.我有一个数据集,它是 Palantir Foundry 代码存储库中定义的 Python 转换的输出。 It has a primary key, but given that over time the data may change I want to validate this primary key holds in the future.它有一个主键,但考虑到随着时间的推移数据可能会发生变化,我想在未来验证这个主键是否成立。

How can I create a data health expectation or check to ensure the primary key holds in future?如何创建数据健康预期或检查以确保将来保持主键?

You can define data expectations in your Python transform, for example:您可以在 Python 转换中定义数据预期,例如:

from transforms.api import transform_df, Input, Output, Check
from transforms import expectations as E


@transform_df(
    Output("/path/to/output", checks=[
        Check(E.primary_key("thing_id"), "primary_key: thing_id"),
    ]),
    source_df=Input("/path/to/input"),
)
def compute(source_df):
    return source_df.select("thing_id", "thing_name").distinct()

More information is available in the Palantir Foundry documentation on defining data expectations . Palantir Foundry 文档中提供了有关定义数据期望的更多信息。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM