[英]Get first value of column with condition when group by use spark dataframe
首先,如果我的英語不好,我很抱歉。 我是火花初學者。 我有一個數據框“原始”:
+------------------------+----+------------------------+---+------+
|id |name|phone |sex|source|
+------------------------+----+------------------------+---+------+
|gEzIl5K+6n6GPLD0pAQKFA==|alex|na |M |1 |
|gEzIl5K+6n6GPLD0pAQKFA==|alex|+Uy8Ol77OWiSuuapn5FOUg==|na |2 |
+------------------------+----+------------------------+---+------+
'na':字符串默認值來源:優先級,1 > 2
我期望結果:
+------------------------+----+------------------------+---+------+
|id |name|phone |sex|source|
+------------------------+----+------------------------+---+------+
|gEzIl5K+6n6GPLD0pAQKFA==|alex|+Uy8Ol77OWiSuuapn5FOUg==|M |1 |
+------------------------+----+------------------------+---+------+
我試過:
val rs = raw.orderBy(source)
.groupBy(col("id"))
.agg(first(when(col("phone") === "na" || col("phone") === ""
, col("phone"))).as("phone")
, first(when(col("sex") === "na" || col("sex") === ""
, col("sex"))).as("sex")
, first(when(col("source") === "na" || col("source") === ""
, col("source"))).as("source")
)
但不是真的。 希望得到大家的幫助。 萬分感謝!
試試這個。
df.orderBy("source")
.groupBy(col("id"))
.agg(min(when(!'phone.isin("na",""), 'phone)).as("phone"),
min(when(!'sex.isin("na",""),'sex)).as("sex"),
min(when(!'source.isin("na",""), 'source)).as("source"))
.show()
+--------------------+--------------------+---+------+
| id| phone|sex|source|
+--------------------+--------------------+---+------+
|gEzIl5K+6n6GPLD0p...|+Uy8Ol77OWiSuuapn...| M| 1|
+--------------------+--------------------+---+------+
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.