简体   繁体   中英

Few conditions filter Apache Spark

I need to a check a few conditions, so I filtered my RDD this way:

scala> file.filter(r => r(38)=="0").filter(r => r(2)=="0").filter(r => r(3)=="0").count

Is it correct as an alternative of "&&"?

Yes, a series of filters is semantically equivalent to one filter with && in your case.

file.filter(r => r(38) == "0" && r(2) == "0" && r(3) == "0")

However, the variant above is guaranteed to be faster than the earlier version. This can be established via the following:

  1. && is a short circuit operator, and the next comparison happens only if the first one evaluates to true . The number of comparisons in both the cases will be the same (yes!).

  2. The multiple filter version involves three passes over the RDD vs. one pass for a single filter with && .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM