[英]create array from dataframe contain tow column
我有帶架構的數據框
root
|-- _id: long (nullable = true)
|-- data: array (nullable = true)
|-- element: struct (containsNull = true)
| | |-- k: string (nullable = true)
| | |-- v: string (nullable = true)
|-- c : string (nullable = true)
df.show(5)
---------------------------------------
_id | data |c
1 |[[key1,key2,key3,key4,key5],[value1,value2,value3,value4,value5]] |c1
-----------------------------------------------------------------------------
2 |[ [key1,key3,key2,key6],[value11,value31,value2,value61] ] |c2
-----------------------------------------------------------------------------
3 | [[key7,key1,key3,key8,key9],[value7,value1,value3,value8,value91]]|c3
-----------------------------------------------------------------------------
4 |[key3,key2,key4,key5,key10],[value32,value23,value43,value10]] |c4
------------------------------------------------------------------------------
5 |[[key1 ,key2,key4,key10],[value1,value23,value42,value101]] |c1
.
.
.
.
我想知道是否有可能獲得此結果以及我該如何進行
_id|key1 |key2 |key3 |key4 |key5 |key6 |key7 |key8 |key9 |key10 ...
1|value1 |value2 |value3 |value4 |value5 | | | | |
----------------------------------------------------------------------------
2|value11|value2 |value31 | | |value6 | | |
---------------------------------------------------------------------
3|value1 | |value3 | | | |value7 |value8 |value91|
----------------------------------------------------------------------------
4| |value23|value32|value43| | | | |value10
---------------------------------------------------------------------------
5|value1 |value23| |value42| | | | | |value101
.
.
我嘗試使用explode,但沒有得到結果,我也嘗試從第一個拖曳列構造一個數組,但這似乎很困難。
您需要將此數據幀映射到每一行包含數據的數據幀,然后可以使用適當的列名創建一個新的數據幀
這應該為您指明正確的方向...
column_names = df.select("data").collect()[0][0]
data_df = map(lambda x: x[1],df.select("data").collect())
data_par = sc.parallelize(data_df)
new_df = spark.createDataFrame(data_par, column_names, 0.1)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.