Spark（scala）：计算RDD上整列的所有不同值

Question

I have this RDD: 我有这个RDD：

val resultRdd: RDD[(VertexId, String, Seq[Long])]

I want to count the distinct values in Seq of all records. 我想计算所有记录的Seq中的不同值。

for example, if I have 3 records with Seq values as follows: 例如，如果我有3条Seq值的记录，如下所示：

VertexId ------- String -------Seq[Long]
1 ----------------- x -------------  1, 3
2 ----------------- x -------------  1, 5
3 ----------------- x--------------- 2, 3, 6

the result should be = 5 , the count of {1,3,5,2,6} 结果应为= 5，计数{1,3,5,2,6}

Thanks :) 谢谢：）

Answer 1

resultRdd.flatMap(_._3).distinct().count()

Spark（scala）：计算RDD上整列的所有不同值

问题描述

1 个解决方案

解决方案1
6 已采纳 2016-03-13 09:02:31

Spark（scala）：计算RDD上整列的所有不同值

问题描述

1 个解决方案

解决方案1 6 已采纳 2016-03-13 09:02:31

解决方案1
6 已采纳 2016-03-13 09:02:31