[英]scala group by on a Cassandra table column of list type
我想在top_places(这是列表)上应用分组依据。
tenant_id | device_id | top_places
-----------+-----------+------------
T1 | D2 | ['F', 'D']
T1 | D3 | ['F', 'D']
T1 | D4 | ['G', 'D']
T1 | D5 | ['G', 'Q']
T1 | D6 | ['A', 'F']
这是我在以下scala代码片段中运行时得到的结果val results = rows.groupBy("top_places").agg(Map("*"->"count")).withColumnRenamed("COUNT(1)","Total").select("top_places","Total" ).orderBy("Total");
[List(G, D),1]
[List(A, F),1]
[List(G, Q),1]
[List(F, D),2]
我需要的如下,如何获得相同的?
[A,1]
[G,2]
[F,2]
[D,2]
[Q,1]
你快到了。 只需top_places
用explode()
压平top_places
:
val rows = Seq(
("T1", "D2", Seq("F", "D")),
("T1", "D3", Seq("F", "D")),
("T1", "D4", Seq("G", "D")),
("T1", "D5", Seq("G", "Q")),
("T1", "D6", Seq("A", "F"))
).toDF("tenant_id", "device_id", "top_places")
rows.withColumn("top_place", explode($"top_places")).
groupBy("top_place").agg(Map("*"->"count")).
withColumnRenamed("COUNT(1)","Total").
orderBy("total").
show
// +---------+-----+
// |top_place|total|
// +---------+-----+
// | Q| 1|
// | A| 1|
// | G| 2|
// | F| 3|
// | D| 3|
// +---------+-----+
您也可以将agg(Map("*"->"count"))
替换为agg(count())
:
rows.withColumn("top_place", explode($"top_places")).
groupBy("top_place").agg(count("top_place").as("total")).
orderBy("total")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.