简体   繁体   English

如何将数据集转换为 repo 中的字典。 我在铸造厂内使用 pyspark

[英]How do I transform the data set into a dictionary inside the repo. I am using pyspark within foundry

I created a fusion sheet data to be synced to the data set.我创建了要同步到数据集的融合表数据。 now, I want to use that data set for creating a dictionary in the repo.现在,我想使用该数据集在 repo 中创建字典。 I am using pyspark in the repo.我在 repo 中使用 pyspark。 later I want to use that dictionary to be passed so that it populates descriptions as it is in Is there a tool available within Foundry that can automatically populate column descriptions?稍后我想使用该字典进行传递,以便它按原样填充描述Foundry是否有可以自动填充列描述的工具? If so, what is it called? 如果有,它叫什么? . .

it would great if anyone can help me creating the dictionary from data set using pyspark in the repo.如果有人可以帮助我使用 repo 中的 pyspark 从数据集创建字典,那就太好了。

The following code would convert your pyspark dataframe into a list of dictionaries:以下代码会将您的 pyspark 数据框转换为字典列表:

fusion_rows = map(lambda row: row.asDict(), fusion_df.collect())

However, in your particular case, you can use the following snippet:但是,在您的特定情况下,您可以使用以下代码段:

col_descriptions = {row["column_name"]: row["description"] for row in fusion_df.collect()}
my_output.write_dataframe(
    my_input.dataframe(),
    column_descriptions=col_descriptions
)

Assuming your Fusion sheet would look like this:假设您的 Fusion 表如下所示:

+------------+------------------+
| column_name|       description|
+------------+------------------+
|       col_A| description for A|
|       col_B| description for B|
+------------+------------------+

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM