繁体   English   中英

Scala中两个Map rdd的交点

[英]Intersection of Two Map rdd's in Scala

我有两个RDD,例如:firstmapRDD - (0-14,List(0,4,19,19079,42697,444,42748))

secondmapRdd-(0-14,列表(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21) ,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46 ,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71 ,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94))

我想找到十字路口。 我试过,var interResult = firstmapRDD.intersection(secondmapRdd),它在输出文件中没有显示结果。
我也试过,基于键组合,mapRDD.cogroup(secondMapRDD).filter(x =>),但我不知道如何找到两个值之间的交集,是x => x._1.intersect( x._2),有人可以帮我解释语法吗?
即使这会引发编译时错误,mapRDD.cogroup(secondMapRDD).filter(x => x._1.intersect(x._2))

 var mapRDD = sc.parallelize(map.toList)
 var secondMapRDD = sc.parallelize(secondMap.toList)
 var interResult = mapRDD.intersection(secondMapRDD)  

可能是因为ArrayBuffer [List []]值,因为交集不起作用。 是否有任何黑客删除它?

我试过这样做

var interResult = mapRDD.cogroup(secondMapRDD).filter{case (_, (l,r))    => l.nonEmpty && r.nonEmpty }. map{case (k,(l,r)) => (k, l.toList.intersect(r.toList))}

仍然有一个空列表!

由于您intersect on values上看是intersect on values ,因此需要join两个RDD,获取所有匹配的值,然后对值进行交叉。

示例代码:

val firstMap = Map(1 -> List(1,2,3,4,5))
  val secondMap = Map(1 -> List(1,2,5))

  val firstKeyRDD = sparkContext.parallelize(firstMap.toList, 2)
  val secondKeyRDD = sparkContext.parallelize(secondMap.toList, 2)

  val joinedRDD = firstKeyRDD.join(secondKeyRDD)
  val finalResult = joinedRDD.map(tuple => {
    val matchedLists = tuple._2
    val intersectValues = matchedLists._1.intersect(matchedLists._2)
    (tuple._1, intersectValues)
  })

  finalResult.foreach(println)

输出将是

(1,List(1, 2, 5))

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM