简体   繁体   English

在使用RDD的Scala中,如果RDD [(k,Iterable [v]),如何在Iterable中获得Apply函数

[英]in Scala using RDD , how do you get apply function in the Iterable if RDD[(k,Iterable[v])

I am trying to figure a way to efficiently apply filters (multiple) on the Iterable[Value] for a given RDD[Key,Iterable[Value]] class. 我试图为给定的RDD [Key,Iterable [Value]]类有效地在Iterable [Value]上应用过滤器(多个)的方法。

The reason being is I want to filter the RDD and eventually find the keys that match the filter 原因是我想过滤RDD并最终找到与过滤器匹配的键

Example of RDD RDD示例

 000473643-02,CompactBuffer((glucose,80.0), (glucose,80.0), (glucose2,80.0),   (fasting blood glucose,80.0), (glucose,80.0), (glucose,80.0), (glucose,80.0), (glucose,80.0)))
 (713003448-01,CompactBuffer((glucose,80.0), (glucose,80.0)))
 (000023838-01,CompactBuffer((glucose,80.0), (glucose,80.0)))
 (000772974-01,CompactBuffer((glucose,80.0), (glucose,80.0), (glucose,80.0)))
 (380670000-01,CompactBuffer((glucose,80.0), (glucose,80.0)))

So in this case, I need to output the Key only when the the following is true: 因此,在这种情况下,仅在满足以下条件时才需要输出Key:

    glucose value is >= 80 or fasting blood glucose >= 80 

I would use something like this: 我会用这样的东西:

case class ExceedsCondition(threshold:Double) {
  def violates(value:Double) = value >= threshold
}

val conditionsBroadcast = sc.broadcast(
  Map("glucose" -> ExceedsCondition(80.0), 
      "fasting-glucose" -> ExceedsCondition(81.0))
  )

val rdd = sc.parallelize(List("key1" -> List(("glucose" -> 83.0))))

val result = rdd.filter { case (_, xs) =>
            val conditions = conditionsBroadcast.value
            xs.exists { case (key, value) =>
                conditions.get(key).exists(_.violates(value))
            }
}

result.take(10)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM