[英]How to create dataframe from dict in another dataframe?
I'm have a column of spark-dataframe
我有一列spark-dataframe
Output from df.select('parsed').show()
: df.select('parsed').show()
:
+--------------------+
| parsed|
+--------------------+
|{Action Flags=I, ...|
|{Action Flags=I, ...|
|{Action Flags=I, ...|
|{Action Flags=I, ...|
+--------------------+
All elements of this column is dict.此列的所有元素都是字典。
How I can made new spark-dataframe
from dicts using keys as column names?如何使用键作为列名从 dicts 创建新的spark-dataframe
?
Before converting columns from a column having dict as values, you must know about its keys.在从具有 dict 作为值的列转换列之前,您必须了解它的键。 So can label columns.所以可以标记列。 Below i creating sample dataframe
and then converting dict keys to column.下面我创建sample dataframe
,然后将字典键转换为列。
df = sqlContext.createDataFrame([
[{'a':1,'b':2, 'c': 3}],
[{'a':1,'b':2, 'c': 3}],
[{'a':1,'b':2, 'c': 3}]], ["col"]
)
df.show(truncate=False)
+---------------------------+
|col |
+---------------------------+
|Map(b -> 2, c -> 3, a -> 1)|
|Map(b -> 2, c -> 3, a -> 1)|
|Map(b -> 2, c -> 3, a -> 1)|
+---------------------------+
After creating sample dataframe lets get first row from it -创建示例数据框后,让我们从中获取第一行 -
first_row = df.first()['col'] #select column which have dict as values
print (first_row)
{u'a': 1, u'b': 2, u'c': 3}
Now we have values from first row and also dict column values, extract keys from it so we can create column from it -现在我们有第一行的值和 dict 列值,从中提取键,以便我们可以从中创建列 -
columns = first_row.keys()
print (columns)
[u'a', u'c', u'b']
After this loop over column list and select these as column from dict column -在此循环列列表之后并从字典列中选择这些作为列 -
from pyspark.sql import functions as F
col_list = [F.col("col").getItem(col).alias(col) for col in columns]
df.select(col_list).show()
+---+---+---+
| a| c| b|
+---+---+---+
| 1| 3| 2|
| 1| 3| 2|
| 1| 3| 2|
+---+---+---+
There are others ways to do this also.还有其他方法可以做到这一点。 Above i have mentioned one way, below is second by creating new column with withColumn
-上面我提到了一种方法,下面是第二种方法,使用withColumn
创建新列 -
for cl in columns: #already created columns variable
df = df.withColumn(cl, F.col("col").getItem(cl))
df.show(truncate=False)
+---------------------------+---+---+---+
|col |a |c |b |
+---------------------------+---+---+---+
|Map(b -> 2, c -> 3, a -> 1)|1 |3 |2 |
|Map(b -> 2, c -> 3, a -> 1)|1 |3 |2 |
|Map(b -> 2, c -> 3, a -> 1)|1 |3 |2 |
+---------------------------+---+---+---+
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.