Scala / Python與Java：PI示例中的SparkContext.map與.filter？

Question

在http://spark.apache.org/examples.html的Pi示例中

在Estimating Pi示例中，我不明白Python / Scala與Java示例之間存在差異。 Python和Scala都使用map和reduce：

蟒蛇

def sample(p):
    x, y = random(), random()
    return 1 if x*x + y*y < 1 else 0

count = spark.parallelize(xrange(0, NUM_SAMPLES)).map(sample) \
             .reduce(lambda a, b: a + b)
print "Pi is roughly %f" % (4.0 * count / NUM_SAMPLES)

斯卡拉

val count = spark.parallelize(1 to NUM_SAMPLES).map{i =>
  val x = Math.random()
  val y = Math.random()
  if (x*x + y*y < 1) 1 else 0
}.reduce(_ + _)
println("Pi is roughly " + 4.0 * count / NUM_SAMPLES)

但Java正在使用過濾器：

int count = spark.parallelize(makeRange(1, NUM_SAMPLES)).filter(new    
  Function<Integer, Boolean>() {
    public Boolean call(Integer i) {
      double x = Math.random();
      double y = Math.random();
      return x*x + y*y < 1;
   }
}).count();
System.out.println("Pi is roughly " + 4 * count / NUM_SAMPLES);

這只是一個文檔錯誤/錯誤嗎？ 在Java中是否優選過濾器，在Scala和Python中出於某種原因優先使用map / reduce？

Answer 1

這些方法是等效的。 Java代碼只是計算Scala / Python映射返回的情況1.只是為了使它更透明：

def inside(x, y):
    """Check if point (x, y) is inside a unit circle
    with center in the origin (0, 0)"""
    return x*x + y*y < 1

points = ... 

# Scala / Python code is equivalent to this
sum([1 if inside(x, y) else 0 for (x, y) in points])

# While Java code is equivalent to this
len([(x, y) for (x, y) in points if inside(x, y)])

最后總和得到的是與圓圈所覆蓋的方形區域的面積分數成比例，並且從公式中我們知道它等於π。

Scala / Python與Java：PI示例中的SparkContext.map與.filter？

問題描述

1 個解決方案

解決方案1
3 已采納 2015-11-24 17:42:07

Scala / Python與Java：PI示例中的SparkContext.map與.filter？

問題描述

1 個解決方案

解決方案1 3 已采納 2015-11-24 17:42:07

解決方案1
3 已采納 2015-11-24 17:42:07