[英]search rdd for value from another rdd
I am using Spark + Scala. 我正在使用Spark + Scala。 My rdd1 has customer info ie ( id
, [name, address]
). 我的rdd1具有客户信息,即( id
, [name, address]
)。 rdd2 has only names of high profile customers. rdd2只有知名客户的名称。 Now I want to find if customer in rdd1 is high profile or not. 现在,我想确定rdd1中的客户是否高调。 How can I search one rdd using another? 如何使用另一个搜索rdd? Joining rdd's is not looking like a good solution for me. 对我来说,加入rdd's似乎不是一个好的解决方案。
My code: 我的代码:
val result = rdd1.map( case (id, customer) =>
customer.foreach ( c =>
rdd2.filter(_ == c._1).count()!=0 ))
Error : org.apache.spark.SparkException: RDD transformations and actions can only be invoked by the driver, not inside of other transformations
; 错误 : org.apache.spark.SparkException: RDD transformations and actions can only be invoked by the driver, not inside of other transformations
;
You have to broadcast one rdd by collecting it. 您必须通过收集广播一路。 You can broadcast the smaller rdd to improve performance. 您可以广播较小的rdd以提高性能。
val bcastRdd = sc.broadcast(rdd2.collect)
rdd1.map(
case (id, customer) => customer.foreach(c =>
bcastRdd.value.filter(_ == c._1).count()!=0))
You can use the left outer join, to avoid an expensive operation such as the collect (if your RDDs are big) 您可以使用左外部联接来避免昂贵的操作,例如收集(如果您的RDD很大)
Also like Daniel pointed out, a broadcast is not necessary. 就像丹尼尔(Daniel)指出的那样,广播不是必需的。
Here is a snippet that can help to obtain RDD1 with a flag which denotes he is a high profile customer or a low profile customer. 这是一个片段,可以帮助获得带有标志的RDD1,该标志表示他是高端客户或低端客户。
val highProfileFlag = 1
val lowProfileFlag = 0
// Keying rdd 1 by the name
val rdd1Keyed = rdd1.map { case (id, (name, address)) => (name, (id, address)) }
// Keying rdd 2 by the name and adding a high profile flag
val rdd2Keyed = rdd2.map { case name => (name, highProfileFlag) }
// The join you are looking for is the left outer join
val rdd1HighProfileFlag = rdd1Keyed
.leftOuterJoin(rdd2Keyed)
.map { case (name, (id, address), highProfileString) =>
val profileFlag = highProfileString.getOrElse(lowProfileFlag)
(id , (name, address, profileFlag))
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.