繁体   English   中英

基于spark scala中完整df的列值设置列值

[英]Setting the column value based on the column value of complete df in spark scala

可以帮我解决用例:下面是数据集

+-----------+-------------+------------+
|artistId   |musicalGroups|displayName |
+-----------+-------------+------------+
|wa_16      |wa_31        |Exods       |
|wa_38      |wa_16        |Kirk        |
+-----------+-------------+------------+

我想根据musicGroups值填充列名,并根据artistId displayName列值为其设置名称 就像在下面的示例中,我们将 wa_16 作为其名称为 Exods 的艺术家 ID,因此名称列应根据其艺术家 ID 具有 displayName。 例子:

+-----------+-------------+------------+
|artistId   |musicalGroups|displayName |name
+-----------+-------------+------------+
|wa_16      |wa_31        |Exods       |null
|wa_38      |wa_16        |Kirk        |Exods
+-----------+-------------+------------+

Tried via self join on artistId and musicalGroups, but it was not working.
Can some help me to solve this usecase?
 
val df = `your existing dataframe`

// Derive new dataset from the original dataset
val newDF = df.select("artistId", "displayName").distinct()

// Join new dataset with original dataset based on the common key and select the relevant columns
val combinedDF = df.join(newDF, df.col("musicalGroups") === newDF.col("artistId"), "leftOuter").select(df.col("artistId") as "artistId", df.col("musicalGroups") as "musicalGroups", df.col("displayName") as "displayName", newDF.col("displayName") as "name")

IIUC,您可以使用pivot()groupBy()

df = spark.createDataFrame([("wa_16","wa_31","Exods"),("wa_38","wa_16","Krik")],["artistId","musicalGroups","displayName"])
df_grp = df.groupBy("artistId", "musicalGroups", "displayName").pivot("displayName").agg(F.first(F.col("artistId")))
df.show()
df_grp.show()

+--------+-------------+-----------+
|artistId|musicalGroups|displayName|
+--------+-------------+-----------+
|   wa_16|        wa_31|      Exods|
|   wa_38|        wa_16|       Krik|
+--------+-------------+-----------+

+--------+-------------+-----------+-----+-----+
|artistId|musicalGroups|displayName|Exods| Krik|
+--------+-------------+-----------+-----+-----+
|   wa_16|        wa_31|      Exods|wa_16| null|
|   wa_38|        wa_16|       Krik| null|wa_38|
+--------+-------------+-----------+-----+-----+

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM