I am having below Dataframe :
+---+--------+---------+-------+ |age|children|education| income| +---+--------+---------+-------+ | 50| 2| null| null| | 34| 4| null| null| | 34| null| true|60000.0| | 32| null| false|35000.0| +---+--------+---------+-------+
I want output something like below :
+---+--------+---------+-------+ |age|children|education| income| +---+--------+---------+-------+ | 50| 2| null| null| | 34| 4| true|60000.0| | 32| null| false|35000.0| +---+--------+---------+-------+
You can see column contain 34 was duplicate so i want to merge value for 34 row (not null value of other row) 列包含34个重复所以我想合并34行的值(不是其他行的空值)
Thanks
If first not null in group is required, can be achived with "first" function:
val df = Seq(
(50, Some(2), None, None),
(34, Some(4), None, None),
(34, None, Some(true), Some(60000.0)),
(32, None, Some(false), Some(35000.0))
).toDF("age", "children", "education", "income")
val result = df
.groupBy("age")
.agg(
first("children", ignoreNulls = true).alias("children"),
first("education", ignoreNulls = true).alias("education"),
first("income", ignoreNulls = true).alias("income")
)
result.orderBy("age").show(false)
Output:
+---+--------+---------+-------+
|age|children|education|income |
+---+--------+---------+-------+
|32 |null |false |35000.0|
|34 |4 |true |60000.0|
|50 |2 |null |null |
+---+--------+---------+-------+
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.