火花減少和地圖問題

Question

我在Spark做了一個小實驗，我遇到了麻煩。

wordCounts is : [('rat', 2), ('elephant', 1), ('cat', 2)]


# TODO: Replace <FILL IN> with appropriate code
from operator import add
totalCount = (wordCounts
              .map(lambda x: (x,1))   <==== something wrong with this line maybe
              .reduce(sum))            <====omething wrong with this line maybe
average = totalCount / float(wordsRDD.map(lambda x: (x,1)).reduceByKey(add).count())
print totalCount
print round(average, 2)

# TEST Mean using reduce (3b)
Test.assertEquals(round(average, 2), 1.67, 'incorrect value of average')

Answer 1

我想出了我的解決方案：

from operator import add
totalCount = (wordCounts
              .map(lambda x: x[1])
              .reduce(add))
average = totalCount / float(wordsRDD.map(lambda x: (x,1)).reduceByKey(add).count())
print totalCount
print round(average, 2)

Answer 2

我自己不確定，但從查看你的代碼我可以看到一些問題。 'map'函數不能與'list_name.map（some stuff）'之類的列表一起使用，你需要像這樣調用map函數：'variable = map（function，arguments）'，如果你正在使用python 3，你需要做'variable = list（map（function，arguments））'。 希望有所幫助:)

Answer 3

另一種類似的方法：您還可以將列表作為鍵，值對讀取並使用Distinct（）

from operator import add
totalCount = (wordCounts
          .map(lambda (k,v)  : v )
          .reduce(add))
average = totalCount / float(wordCounts.distinct().count())
print totalCount
print round(average, 2)

火花減少和地圖問題

問題描述

3 個解決方案

解決方案1
2 已采納 2015-06-07 18:47:52

解決方案2
1 2015-06-07 18:47:16

解決方案3
0 2016-07-27 07:04:06

火花減少和地圖問題

問題描述

3 個解決方案

解決方案1 2 已采納 2015-06-07 18:47:52

解決方案2 1 2015-06-07 18:47:16

解決方案3 0 2016-07-27 07:04:06

解決方案1
2 已采納 2015-06-07 18:47:52

解決方案2
1 2015-06-07 18:47:16

解決方案3
0 2016-07-27 07:04:06