简体   繁体   English

如何将 pyspark 数据框列转换为字典

[英]how to convert pyspark data frame columns into a dict

I have a dataframe with 2 columns.我有一个带有 2 列的 dataframe。

Col1: String, Col2:String.

I want to create a dict like {'col1':'col2'} .我想创建一个像{'col1':'col2'}的字典。

For example, the below csv data:例如下面的 csv 数据:

var1,InternalCampaignCode
var2,DownloadFileName
var3,ExternalCampaignCode

has to become:必须变成:

{'var1':'InternalCampaignCode','var2':'DownloadFileName', ...}

The dataframe is having around 200 records. dataframe 有大约 200 条记录。

Please let me know how to achieve this.请让我知道如何实现这一目标。

The following should do the trick:以下应该可以解决问题:

df_as_dict = map(lambda row: row.asDict(), df.collect())

Note that this is going to generate a list of dictionaries, where each dictionary represents a single record of your pyspark dataframe:请注意,这将生成一个字典列表,其中每个字典代表 pyspark dataframe 的单个记录:

[
  {'Col1': 'var1', 'Col2': 'InternalCampaignCode'},
  {'Col1': 'var2', 'Col2': 'DownloadFileName'},
  {'Col1': 'var3', 'Col3': 'ExternalCampaignCode'},
]

You can do a dict comprehension:你可以做一个 dict 理解:

result = {r[0]: r[1] for r in df.collect()}

which gives这使

{'var1': 'InternalCampaignCode', 'var2': 'DownloadFileName', 'var3': 'ExternalCampaignCode'}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM