[英]pyspark NameError: global name 'accumulators' is not defined
I followed the quick start tutorial.我遵循了快速入门教程。
My script is我的脚本是
from pyspark import SparkContext
logFile = 'README.md'
sc = SparkContext('local', 'Simple App')
logData = sc.textFile(logFile).cache()
numAs = logData.filter(lambda s: 'a' in s).count()
numBs = logData.filter(lambda s: 'b' in s).count()
print 'Lines with a: %i, lines with b: %i' % (numAs, numBs)
I ran the script on the command line我在命令行上运行脚本
$SPARK_HOME/bin/spark-submit --master local[2] SimpleApp.py
Traceback (most recent call last):
回溯(最近一次调用最后一次):
File "/home/huayu/Programs/Machine_learning/spark_exe/quick_start/SimpleApp.py", line 4, in sc = SparkContext('local', 'Simple App')文件“/home/huayu/Programs/Machine_learning/spark_exe/quick_start/SimpleApp.py”,第 4 行,在 sc = SparkContext('local', 'Simple App')
File "/home/huayu/Downloads/Software/spark/python/pyspark/context.py", line 115, in init conf, jsc, profiler_cls)文件“/home/huayu/Downloads/Software/spark/python/pyspark/context.py”,第115行,在init conf、jsc、profiler_cls中)
File "/home/huayu/Downloads/Software/spark/python/pyspark/context.py", line 174, in _do_init self._accumulatorServer = accumulators._start_update_server()文件“/home/huayu/Downloads/Software/spark/python/pyspark/context.py”,第174行,在_do_init self._accumulatorServer = accumulators._start_update_server()
NameError: global name 'accumulators' is not definedNameError:未定义全局名称“累加器”
When I ran python SimpleApp.py
, it worked fine.当我运行
python SimpleApp.py
,它运行良好。
I got Spark from https://github.com/GUG11/spark (version 2.1.0) and I uses python 2.7.12.我从https://github.com/GUG11/spark(2.1.0版)获得 Spark,我使用 python 2.7.12。
There is another problem pertaining to Spark accumulator but the error information in my problem is different.还有另一个关于 Spark 累加器的问题,但我的问题中的错误信息是不同的。 pyspark ImportError: cannot import name accumulators
pyspark ImportError:无法导入名称累加器
You did miss to add getOrCreate()
which actually creates the Spark Context/Session In 2021 you would rather use the Spark session then the Spark Context as it can be now found on the same link http://spark.apache.org/docs/latest/quick-start.html#self-contained-applications您确实错过了添加实际上创建 Spark 上下文/会话的
getOrCreate()
在 2021 年,您宁愿使用 Spark 会话而不是 Spark 上下文,因为它现在可以在同一链接http://spark.apache.org/docs 上找到/latest/quick-start.html#self-contained-applications
"""SimpleApp.py"""
from pyspark.sql import SparkSession
logFile = "YOUR_SPARK_HOME/README.md" # Should be some file on your system
spark = SparkSession.builder.appName("SimpleApp").getOrCreate()
logData = spark.read.text(logFile).cache()
numAs = logData.filter(logData.value.contains('a')).count()
numBs = logData.filter(logData.value.contains('b')).count()
print("Lines with a: %i, lines with b: %i" % (numAs, numBs))
spark.stop()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.