简体   繁体   中英

Scala/Python vs. Java: SparkContext.map vs. .filter in PI example?

In the Pi example at http://spark.apache.org/examples.html

In the Estimating Pi example, there is a discrepancy in the Python/Scala vs. Java example I don't understand. Python and Scala are both using map and reduce:

Python

def sample(p):
    x, y = random(), random()
    return 1 if x*x + y*y < 1 else 0

count = spark.parallelize(xrange(0, NUM_SAMPLES)).map(sample) \
             .reduce(lambda a, b: a + b)
print "Pi is roughly %f" % (4.0 * count / NUM_SAMPLES)

Scala

val count = spark.parallelize(1 to NUM_SAMPLES).map{i =>
  val x = Math.random()
  val y = Math.random()
  if (x*x + y*y < 1) 1 else 0
}.reduce(_ + _)
println("Pi is roughly " + 4.0 * count / NUM_SAMPLES)

But Java is using filter:

int count = spark.parallelize(makeRange(1, NUM_SAMPLES)).filter(new    
  Function<Integer, Boolean>() {
    public Boolean call(Integer i) {
      double x = Math.random();
      double y = Math.random();
      return x*x + y*y < 1;
   }
}).count();
System.out.println("Pi is roughly " + 4 * count / NUM_SAMPLES);

Is this just a doc typo/bug? Is filter preferable in Java and map/reduce preferred in Scala and Python for some reason?

These approaches are equivalent. Java code simply counts cases where Scala / Python map returns 1. Just to make it a little bit more transparent:

def inside(x, y):
    """Check if point (x, y) is inside a unit circle
    with center in the origin (0, 0)"""
    return x*x + y*y < 1

points = ... 

# Scala / Python code is equivalent to this
sum([1 if inside(x, y) else 0 for (x, y) in points])

# While Java code is equivalent to this
len([(x, y) for (x, y) in points if inside(x, y)])

Finally sum you get is proportional to the fraction of the area of the enclosing square covered by the circle and from the formula we know it is equal π.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM