简体   繁体   English

将Apache Spark Scala代码转换为Python

[英]Convert Apache Spark Scala code to Python

Can anyone convert this very simple scala code to python? 任何人都可以将这个非常简单的scala代码转换为python吗?

val words = Array("one", "two", "two", "three", "three", "three")
val wordPairsRDD = sc.parallelize(words).map(word => (word, 1))

val wordCountsWithGroup = wordPairsRDD
    .groupByKey()
    .map(t => (t._1, t._2.sum))
    .collect()

try this: 尝试这个:

words = ["one", "two", "two", "three", "three", "three"]
wordPairsRDD = sc.parallelize(words).map(lambda word : (word, 1))

wordCountsWithGroup = wordPairsRDD
    .groupByKey()
    .map(lambda t: (t[0], sum(t[1])))
    .collect()

Two translate in python : 两个在python中翻译:

from operator import add
wordsList = ["one", "two", "two", "three", "three", "three"]
words = sc.parallelize(wordsList ).map(lambda l :(l,1)).reduceByKey(add).collect()
print words
words = sc.parallelize(wordsList ).map(lambda l : (l,1)).groupByKey().map(lambda t: (t[0], sum(t[1]))).collect()
print words

Assuming you already have a Spark context defined and ready to go: 假设您已经定义了Spark上下文并准备好了:

 from operator import add
 words = ["one", "two", "two", "three", "three", "three"]
 wordsPairRDD = sc.parallelize(words).map(lambda word: (word, 1))
      .reduceByKey(add)
      .collect()

Checkout the github examples repo: Python Examples 查看github示例repo: Python示例

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM