[英]in Scala using RDD , how do you get apply function in the Iterable if RDD[(k,Iterable[v])
我試圖為給定的RDD [Key,Iterable [Value]]類有效地在Iterable [Value]上應用過濾器(多個)的方法。
原因是我想過濾RDD並最終找到與過濾器匹配的鍵
RDD示例
000473643-02,CompactBuffer((glucose,80.0), (glucose,80.0), (glucose2,80.0), (fasting blood glucose,80.0), (glucose,80.0), (glucose,80.0), (glucose,80.0), (glucose,80.0)))
(713003448-01,CompactBuffer((glucose,80.0), (glucose,80.0)))
(000023838-01,CompactBuffer((glucose,80.0), (glucose,80.0)))
(000772974-01,CompactBuffer((glucose,80.0), (glucose,80.0), (glucose,80.0)))
(380670000-01,CompactBuffer((glucose,80.0), (glucose,80.0)))
因此,在這種情況下,僅在滿足以下條件時才需要輸出Key:
glucose value is >= 80 or fasting blood glucose >= 80
我會用這樣的東西:
case class ExceedsCondition(threshold:Double) {
def violates(value:Double) = value >= threshold
}
val conditionsBroadcast = sc.broadcast(
Map("glucose" -> ExceedsCondition(80.0),
"fasting-glucose" -> ExceedsCondition(81.0))
)
val rdd = sc.parallelize(List("key1" -> List(("glucose" -> 83.0))))
val result = rdd.filter { case (_, xs) =>
val conditions = conditionsBroadcast.value
xs.exists { case (key, value) =>
conditions.get(key).exists(_.violates(value))
}
}
result.take(10)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.