简体   繁体   English

很少条件过滤Apache Spark

[英]Few conditions filter Apache Spark

I need to a check a few conditions, so I filtered my RDD this way: 我需要检查一些条件,因此我以这种方式过滤了RDD:

scala> file.filter(r => r(38)=="0").filter(r => r(2)=="0").filter(r => r(3)=="0").count

Is it correct as an alternative of "&&"? 替代“ &&”是否正确?

Yes, a series of filters is semantically equivalent to one filter with && in your case. 是的,在您的情况下,一系列过滤器在语义上等同于带有&&一个过滤器。

file.filter(r => r(38) == "0" && r(2) == "0" && r(3) == "0")

However, the variant above is guaranteed to be faster than the earlier version. 但是,可以保证上述变体比早期版本要快。 This can be established via the following: 这可以通过以下方式建立:

  1. && is a short circuit operator, and the next comparison happens only if the first one evaluates to true . &&是短路运算符,只有在第一个比较结果为true时,才会进行下一个比较。 The number of comparisons in both the cases will be the same (yes!). 两种情况下的比较次数将相同 (是!)。

  2. The multiple filter version involves three passes over the RDD vs. one pass for a single filter with && . 多过滤器版本涉及RDD上的三遍,而使用&&的单个过滤器则为一遍。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM