[英]Convert Apache Spark Scala code to Python
任何人都可以將這個非常簡單的scala代碼轉換為python嗎?
val words = Array("one", "two", "two", "three", "three", "three")
val wordPairsRDD = sc.parallelize(words).map(word => (word, 1))
val wordCountsWithGroup = wordPairsRDD
.groupByKey()
.map(t => (t._1, t._2.sum))
.collect()
嘗試這個:
words = ["one", "two", "two", "three", "three", "three"]
wordPairsRDD = sc.parallelize(words).map(lambda word : (word, 1))
wordCountsWithGroup = wordPairsRDD
.groupByKey()
.map(lambda t: (t[0], sum(t[1])))
.collect()
兩個在python中翻譯:
from operator import add
wordsList = ["one", "two", "two", "three", "three", "three"]
words = sc.parallelize(wordsList ).map(lambda l :(l,1)).reduceByKey(add).collect()
print words
words = sc.parallelize(wordsList ).map(lambda l : (l,1)).groupByKey().map(lambda t: (t[0], sum(t[1]))).collect()
print words
假設您已經定義了Spark上下文並准備好了:
from operator import add
words = ["one", "two", "two", "three", "three", "three"]
wordsPairRDD = sc.parallelize(words).map(lambda word: (word, 1))
.reduceByKey(add)
.collect()
查看github示例repo: Python示例
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.