Palantir foundry 代码工作簿，从数据集中导出单个 xml

Question

I have a dataset which have an xml column and i am trying to export individual xmls as files with filename being in another column using codeworkbook我有一个数据集，它有一个 xml 列，我正在尝试使用 codeworkbook 将单个 xml 导出为文件名在另一列中的文件

I filtered the rows i want using below code我使用下面的代码过滤了我想要的行

def prepare_input(xml_with_debug):
    from pyspark.sql import functions as F

    filter_column = "key"
    filter_value = "test_key"
    df_filtered = xml_with_debug.filter(filter_value == F.col(filter_column))

    approx_number_of_rows = 1
    sample_percent = float(approx_number_of_rows) / df_filtered.count()

    df_sampled = df_filtered.sample(False, sample_percent, seed=0)

    important_columns = ["key", "xml"]

    return df_sampled.select([F.col(c).cast(F.StringType()).alias(c) for c in important_columns])

It works till here.它工作到这里。 Now for the last part i tried this in a python task, but was complaining about the parameters (i should have set it up wrongly).现在对于最后一部分，我在 python 任务中尝试了这个，但抱怨参数（我应该错误地设置它）。 But even if it works it will be as a single file i think .但即使它有效，我认为它也将作为一个文件。

from transforms.api import transform, Input, Output
@transform(
     output=Output("/path/to/python_csv"),
     my_input=Input("/path/to/input")
)
def my_compute_function(output, my_input):
     output.write_dataframe(my_input.dataframe().coalesce(1), output_format="csv", options={"header": "true"})

I am trying to set it up in GUI like below我正在尝试在 GUI 中进行设置，如下所示

My question i guess is, what will be the code in the last Python task (write_file) after the prepare input so that i extract individual xmls (And if possible zip them into single file for download)我想我的问题是，在准备输入之后，最后一个 Python 任务（write_file）中的代码是什么，以便我提取单个 xml（如果可能的话，将它们 zip 放入单个文件以供下载）

Answer 1

You can access the output dataset filesystem and write files into it in whatever format you want.您可以访问 output 数据集文件系统并以您想要的任何格式将文件写入其中。
The documentation for that can be found here: https://www.palantir.com/docs/foundry/code-workbook/transforms-unstructured/#writing-files相关文档可在此处找到： https://www.palantir.com/docs/foundry/code-workbook/transforms-unstructured/#writing-files
(If you want to do it from a code repository it's very similar https://www.palantir.com/docs/foundry/transforms-python/unstructured-files/#writing-files ) （如果您想从代码存储库中执行它，它非常相似https://www.palantir.com/docs/foundry/transforms-python/unstructured-files/#writing-files ）

By doing that you can create multiple different files or you can create a single zip file and write it into a dataset.通过这样做，您可以创建多个不同的文件，或者您可以创建一个 zip 文件并将其写入数据集。

Palantir foundry 代码工作簿，从数据集中导出单个 xml

问题描述

1 个解决方案

解决方案1
0 2023-02-01 07:46:52

Palantir foundry 代码工作簿，从数据集中导出单个 xml

问题描述

1 个解决方案

解决方案1 0 2023-02-01 07:46:52

解决方案1
0 2023-02-01 07:46:52