很少条件过滤Apache Spark

Question

I need to a check a few conditions, so I filtered my RDD this way: 我需要检查一些条件，因此我以这种方式过滤了RDD：

scala> file.filter(r => r(38)=="0").filter(r => r(2)=="0").filter(r => r(3)=="0").count

Is it correct as an alternative of "&&"? 替代“ &&”是否正确？

Answer 1

Yes, a series of filters is semantically equivalent to one filter with && in your case. 是的，在您的情况下，一系列过滤器在语义上等同于带有&&一个过滤器。

file.filter(r => r(38) == "0" && r(2) == "0" && r(3) == "0")

However, the variant above is guaranteed to be faster than the earlier version. 但是，可以保证上述变体比早期版本要快。 This can be established via the following: 这可以通过以下方式建立：

&& is a short circuit operator, and the next comparison happens only if the first one evaluates to true . &&是短路运算符，只有在第一个比较结果为true时，才会进行下一个比较。 The number of comparisons in both the cases will be the same (yes!). 两种情况下的比较次数将相同（是！）。
The multiple filter version involves three passes over the RDD vs. one pass for a single filter with && . 多过滤器版本涉及RDD上的三遍，而使用&&的单个过滤器则为一遍。

很少条件过滤Apache Spark

问题描述

1 个解决方案

解决方案1
4 已采纳 2016-04-22 21:36:35

很少条件过滤Apache Spark

问题描述

1 个解决方案

解决方案1 4 已采纳 2016-04-22 21:36:35

解决方案1
4 已采纳 2016-04-22 21:36:35