將Apache Spark Scala代碼轉換為Python

Question

任何人都可以將這個非常簡單的scala代碼轉換為python嗎？

val words = Array("one", "two", "two", "three", "three", "three")
val wordPairsRDD = sc.parallelize(words).map(word => (word, 1))

val wordCountsWithGroup = wordPairsRDD
    .groupByKey()
    .map(t => (t._1, t._2.sum))
    .collect()

Answer 1

嘗試這個：

words = ["one", "two", "two", "three", "three", "three"]
wordPairsRDD = sc.parallelize(words).map(lambda word : (word, 1))

wordCountsWithGroup = wordPairsRDD
    .groupByKey()
    .map(lambda t: (t[0], sum(t[1])))
    .collect()

Answer 2

兩個在python中翻譯：

from operator import add
wordsList = ["one", "two", "two", "three", "three", "three"]
words = sc.parallelize(wordsList ).map(lambda l :(l,1)).reduceByKey(add).collect()
print words
words = sc.parallelize(wordsList ).map(lambda l : (l,1)).groupByKey().map(lambda t: (t[0], sum(t[1]))).collect()
print words

Answer 3

假設您已經定義了Spark上下文並准備好了：

 from operator import add
 words = ["one", "two", "two", "three", "three", "three"]
 wordsPairRDD = sc.parallelize(words).map(lambda word: (word, 1))
      .reduceByKey(add)
      .collect()

查看github示例repo： Python示例

將Apache Spark Scala代碼轉換為Python

問題描述

3 個解決方案

解決方案1
5 已采納 2015-06-12 20:49:40

解決方案2
2 2015-06-12 20:53:52

解決方案3
2 2015-06-12 20:58:05

將Apache Spark Scala代碼轉換為Python

問題描述

3 個解決方案

解決方案1 5 已采納 2015-06-12 20:49:40

解決方案2 2 2015-06-12 20:53:52

解決方案3 2 2015-06-12 20:58:05

解決方案1
5 已采納 2015-06-12 20:49:40

解決方案2
2 2015-06-12 20:53:52

解決方案3
2 2015-06-12 20:58:05