如何将 pyspark 数据框列转换为字典

Question

I have a dataframe with 2 columns.我有一个带有 2 列的 dataframe。

Col1: String, Col2:String.

I want to create a dict like {'col1':'col2'} .我想创建一个像{'col1':'col2'}的字典。

For example, the below csv data:例如下面的 csv 数据：

var1,InternalCampaignCode
var2,DownloadFileName
var3,ExternalCampaignCode

has to become:必须变成：

{'var1':'InternalCampaignCode','var2':'DownloadFileName', ...}

The dataframe is having around 200 records. dataframe 有大约 200 条记录。

Please let me know how to achieve this.请让我知道如何实现这一目标。

Answer 1

The following should do the trick:以下应该可以解决问题：

df_as_dict = map(lambda row: row.asDict(), df.collect())

Note that this is going to generate a list of dictionaries, where each dictionary represents a single record of your pyspark dataframe:请注意，这将生成一个字典列表，其中每个字典代表 pyspark dataframe 的单个记录：

[
  {'Col1': 'var1', 'Col2': 'InternalCampaignCode'},
  {'Col1': 'var2', 'Col2': 'DownloadFileName'},
  {'Col1': 'var3', 'Col3': 'ExternalCampaignCode'},
]

Answer 2

You can do a dict comprehension:你可以做一个 dict 理解：

result = {r[0]: r[1] for r in df.collect()}

which gives这使

{'var1': 'InternalCampaignCode', 'var2': 'DownloadFileName', 'var3': 'ExternalCampaignCode'}

如何将 pyspark 数据框列转换为字典

问题描述

2 个解决方案

解决方案1
0 2021-02-03 17:58:03

解决方案2
0 2021-02-03 18:10:43

如何将 pyspark 数据框列转换为字典

问题描述

2 个解决方案

解决方案1 0 2021-02-03 17:58:03

解决方案2 0 2021-02-03 18:10:43

解决方案1
0 2021-02-03 17:58:03

解决方案2
0 2021-02-03 18:10:43