简体   繁体   中英

Spark Dataframe - Computation of pairs between columns (Scala)

I have the following situation: I have a dataframe with an 'id' and 'array' as the schema. Now I want to get for each array, all lists of pairs with the corresponding id and save it again in a dataframe. So for example:

This is the original dataframe:

+---+----------+
| id|candidates|
+---+----------+
|  1|    [2, 3]|
|  2|       [3]|
+---+----------+

And that is how it have to look like after the computation:

+---+---+
|id1|id2|
+---+---+
|  1|  2|
|  1|  3|
|  2|  3|
+---+---+

Maybe someone has an idea for this problem?

Ok, thanks @cheseaux I found the answer! There is the simply explode_outer function:

    candidatesDF.withColumn("candidates", explode_outer($"candidates")).show

只需explode数组列。

candidatesDF.withColumn("id2", explode('candidates))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM