簡體   English   中英

Scala按列表類型的Cassandra表列分組

[英]scala group by on a Cassandra table column of list type

我想在top_places(這是列表)上應用分組依據。

tenant_id | device_id | top_places
-----------+-----------+------------
        T1 |        D2 | ['F', 'D']
        T1 |        D3 | ['F', 'D']
        T1 |        D4 | ['G', 'D']
        T1 |        D5 | ['G', 'Q']
        T1 |        D6 | ['A', 'F']

這是我在以下scala代碼片段中運行時得到的結果val results = rows.groupBy("top_places").agg(Map("*"->"count")).withColumnRenamed("COUNT(1)","Total").select("top_places","Total" ).orderBy("Total");

[List(G, D),1]                                                                  
[List(A, F),1]
[List(G, Q),1]
[List(F, D),2]

我需要的如下,如何獲得相同的?

[A,1]
[G,2]
[F,2]
[D,2]
[Q,1]

你快到了。 只需top_placesexplode()壓平top_places

val rows = Seq(
  ("T1", "D2", Seq("F", "D")),
  ("T1", "D3", Seq("F", "D")),
  ("T1", "D4", Seq("G", "D")),
  ("T1", "D5", Seq("G", "Q")),
  ("T1", "D6", Seq("A", "F"))
).toDF("tenant_id", "device_id", "top_places")

rows.withColumn("top_place", explode($"top_places")).
  groupBy("top_place").agg(Map("*"->"count")).
  withColumnRenamed("COUNT(1)","Total").
  orderBy("total").
  show

// +---------+-----+                                                               
// |top_place|total|
// +---------+-----+
// |        Q|    1|
// |        A|    1|
// |        G|    2|
// |        F|    3|
// |        D|    3|
// +---------+-----+

您也可以將agg(Map("*"->"count"))替換為agg(count())

rows.withColumn("top_place", explode($"top_places")).
  groupBy("top_place").agg(count("top_place").as("total")).
  orderBy("total")

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM