简体   繁体   English

在rdd中搜索另一个rdd的值

[英]search rdd for value from another rdd

I am using Spark + Scala. 我正在使用Spark + Scala。 My rdd1 has customer info ie ( id , [name, address] ). 我的rdd1具有客户信息,即( id[name, address] )。 rdd2 has only names of high profile customers. rdd2只有知名客户的名称。 Now I want to find if customer in rdd1 is high profile or not. 现在,我想确定rdd1中的客户是否高调。 How can I search one rdd using another? 如何使用另一个搜索rdd? Joining rdd's is not looking like a good solution for me. 对我来说,加入rdd's似乎不是一个好的解决方案。

My code: 我的代码:

val result = rdd1.map( case (id, customer) => 
  customer.foreach ( c => 
    rdd2.filter(_ == c._1).count()!=0 ))

Error : org.apache.spark.SparkException: RDD transformations and actions can only be invoked by the driver, not inside of other transformations ; 错误org.apache.spark.SparkException: RDD transformations and actions can only be invoked by the driver, not inside of other transformations

You have to broadcast one rdd by collecting it. 您必须通过收集广播一路。 You can broadcast the smaller rdd to improve performance. 您可以广播较小的rdd以提高性能。

val bcastRdd = sc.broadcast(rdd2.collect)
rdd1.map(
   case (id, customer) => customer.foreach(c => 
        bcastRdd.value.filter(_ == c._1).count()!=0))

You can use the left outer join, to avoid an expensive operation such as the collect (if your RDDs are big) 您可以使用左外部联接来避免昂贵的操作,例如收集(如果您的RDD很大)

Also like Daniel pointed out, a broadcast is not necessary. 就像丹尼尔(Daniel)指出的那样,广播不是必需的。

Here is a snippet that can help to obtain RDD1 with a flag which denotes he is a high profile customer or a low profile customer. 这是一个片段,可以帮助获得带有标志的RDD1,该标志表示他是高端客户或低端客户。

val highProfileFlag = 1
val lowProfileFlag = 0 

// Keying rdd 1 by the name    
val rdd1Keyed = rdd1.map { case (id, (name, address)) => (name, (id, address)) }

// Keying rdd 2 by the name and adding a high profile flag
val rdd2Keyed = rdd2.map { case name => (name, highProfileFlag) }

// The join you are looking for is the left outer join
val rdd1HighProfileFlag = rdd1Keyed
.leftOuterJoin(rdd2Keyed)
.map { case (name, (id, address), highProfileString) => 
      val profileFlag = highProfileString.getOrElse(lowProfileFlag) 
      (id , (name, address, profileFlag))
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM