简体   繁体   English

从pyspark中的字典列创建数据框

[英]Create a dataframe from column of dictionaries in pyspark

I want to create a new dataframe from existing dataframe in pyspark.我想从 pyspark 中的现有数据帧创建一个新数据帧。 The dataframe "df" contains a column named "data" which has rows of dictionary and has a schema as string.数据框“df”包含一个名为“data”的列,该列具有字典行和字符串模式。 And the keys of each dictionary are not fixed.For example the name and address are the keys for the first row dictionary but that would not be the case for other rows they may be different.并且每个字典的键是不固定的。例如,名称和地址是第一行字典的键,但对于其他行则不是这样,它们可能不同。 following is the example for that;以下是示例;

........................................................
  data 
........................................................
 {"name": "sam", "address":"uk"}
........................................................
{"name":"jack" , "address":"aus", "occupation":"job"}
.........................................................

How do I convert into the dataframe with individual columns like following.如何使用如下所示的单个列转换为数据框。

 name   address    occupation
 sam       uk       
 jack      aus       job

Convert data to an RDD, then use spark.read.json to convert the RDD into a dataFrame with the schema.data转换为 RDD,然后使用spark.read.json将 RDD 转换为具有架构的数据帧。

data = [
    {"name": "sam", "address":"uk"}, 
    {"name":"jack" , "address":"aus", "occupation":"job"}
]

spark = SparkSession.builder.getOrCreate()
df = spark.read.json(sc.parallelize(data)).na.fill('') 
df.show()
+-------+----+----------+
|address|name|occupation|
+-------+----+----------+
|     uk| sam|          |
|    aus|jack|       job|
+-------+----+----------+

If the order of rows is not important, this is another way you can do this:如果行的顺序不重要,这是您可以执行此操作的另一种方法:

from pyspark import SparkContext
sc = SparkContext()

df = sc.parallelize([
    {"name":"jack" , "address":"aus", "occupation":"job"},
    {"name": "sam", "address":"uk"}     
 ]).toDF()

df = df.na.fill('')

df.show()

+-------+----+----------+
|address|name|occupation|
+-------+----+----------+
|    aus|jack|       job|
|     uk| sam|          |
+-------+----+----------+

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM