[英]Palantir foundry code workbook, export individual xmls from dataset
I have a dataset which have an xml column and i am trying to export individual xmls as files with filename being in another column using codeworkbook我有一个数据集,它有一个 xml 列,我正在尝试使用 codeworkbook 将单个 xml 导出为文件名在另一列中的文件
I filtered the rows i want using below code我使用下面的代码过滤了我想要的行
def prepare_input(xml_with_debug):
from pyspark.sql import functions as F
filter_column = "key"
filter_value = "test_key"
df_filtered = xml_with_debug.filter(filter_value == F.col(filter_column))
approx_number_of_rows = 1
sample_percent = float(approx_number_of_rows) / df_filtered.count()
df_sampled = df_filtered.sample(False, sample_percent, seed=0)
important_columns = ["key", "xml"]
return df_sampled.select([F.col(c).cast(F.StringType()).alias(c) for c in important_columns])
It works till here.它工作到这里。 Now for the last part i tried this in a python task, but was complaining about the parameters (i should have set it up wrongly).
现在对于最后一部分,我在 python 任务中尝试了这个,但抱怨参数(我应该错误地设置它)。 But even if it works it will be as a single file i think .
但即使它有效,我认为它也将作为一个文件。
from transforms.api import transform, Input, Output
@transform(
output=Output("/path/to/python_csv"),
my_input=Input("/path/to/input")
)
def my_compute_function(output, my_input):
output.write_dataframe(my_input.dataframe().coalesce(1), output_format="csv", options={"header": "true"})
I am trying to set it up in GUI like below我正在尝试在 GUI 中进行设置,如下所示
My question i guess is, what will be the code in the last Python task (write_file) after the prepare input so that i extract individual xmls (And if possible zip them into single file for download)我想我的问题是,在准备输入之后,最后一个 Python 任务(write_file)中的代码是什么,以便我提取单个 xml(如果可能的话,将它们 zip 放入单个文件以供下载)
You can access the output dataset filesystem and write files into it in whatever format you want.您可以访问 output 数据集文件系统并以您想要的任何格式将文件写入其中。
The documentation for that can be found here: https://www.palantir.com/docs/foundry/code-workbook/transforms-unstructured/#writing-files相关文档可在此处找到: https://www.palantir.com/docs/foundry/code-workbook/transforms-unstructured/#writing-files
(If you want to do it from a code repository it's very similar https://www.palantir.com/docs/foundry/transforms-python/unstructured-files/#writing-files ) (如果您想从代码存储库中执行它,它非常相似https://www.palantir.com/docs/foundry/transforms-python/unstructured-files/#writing-files )
By doing that you can create multiple different files or you can create a single zip file and write it into a dataset.通过这样做,您可以创建多个不同的文件,或者您可以创建一个 zip 文件并将其写入数据集。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.