[英]spark reduce and map issue
我在Spark做了一個小實驗,我遇到了麻煩。
wordCounts is : [('rat', 2), ('elephant', 1), ('cat', 2)]
# TODO: Replace <FILL IN> with appropriate code
from operator import add
totalCount = (wordCounts
.map(lambda x: (x,1)) <==== something wrong with this line maybe
.reduce(sum)) <====omething wrong with this line maybe
average = totalCount / float(wordsRDD.map(lambda x: (x,1)).reduceByKey(add).count())
print totalCount
print round(average, 2)
# TEST Mean using reduce (3b)
Test.assertEquals(round(average, 2), 1.67, 'incorrect value of average')
我想出了我的解決方案:
from operator import add
totalCount = (wordCounts
.map(lambda x: x[1])
.reduce(add))
average = totalCount / float(wordsRDD.map(lambda x: (x,1)).reduceByKey(add).count())
print totalCount
print round(average, 2)
我自己不確定,但從查看你的代碼我可以看到一些問題。 'map'函數不能與'list_name.map(some stuff)'之類的列表一起使用,你需要像這樣調用map函數:'variable = map(function,arguments)',如果你正在使用python 3,你需要做'variable = list(map(function,arguments))'。 希望有所幫助:)
另一種類似的方法:您還可以將列表作為鍵,值對讀取並使用Distinct()
from operator import add
totalCount = (wordCounts
.map(lambda (k,v) : v )
.reduce(add))
average = totalCount / float(wordCounts.distinct().count())
print totalCount
print round(average, 2)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.