I have dataframe with the schema
root
|-- _id: long (nullable = true)
|-- data: array (nullable = true)
|-- element: struct (containsNull = true)
| | |-- k: string (nullable = true)
| | |-- v: string (nullable = true)
|-- c : string (nullable = true)
df.show(5)
---------------------------------------
_id | data |c
1 |[[key1,key2,key3,key4,key5],[value1,value2,value3,value4,value5]] |c1
-----------------------------------------------------------------------------
2 |[ [key1,key3,key2,key6],[value11,value31,value2,value61] ] |c2
-----------------------------------------------------------------------------
3 | [[key7,key1,key3,key8,key9],[value7,value1,value3,value8,value91]]|c3
-----------------------------------------------------------------------------
4 |[key3,key2,key4,key5,key10],[value32,value23,value43,value10]] |c4
------------------------------------------------------------------------------
5 |[[key1 ,key2,key4,key10],[value1,value23,value42,value101]] |c1
.
.
.
.
I want to know if it's possible to get this result and how i must proceed
_id|key1 |key2 |key3 |key4 |key5 |key6 |key7 |key8 |key9 |key10 ...
1|value1 |value2 |value3 |value4 |value5 | | | | |
----------------------------------------------------------------------------
2|value11|value2 |value31 | | |value6 | | |
---------------------------------------------------------------------
3|value1 | |value3 | | | |value7 |value8 |value91|
----------------------------------------------------------------------------
4| |value23|value32|value43| | | | |value10
---------------------------------------------------------------------------
5|value1 |value23| |value42| | | | | |value101
.
.
I tried to use explode but i did'nt get a result , I tried also to construct an array from the first tow column but it seems difficult.
You need to map this dataframe to one where each row contains data, then you can create a new dataframe with the appropriate column names
this should point you in the right direction...
column_names = df.select("data").collect()[0][0]
data_df = map(lambda x: x[1],df.select("data").collect())
data_par = sc.parallelize(data_df)
new_df = spark.createDataFrame(data_par, column_names, 0.1)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.