I had try example code about python in zeppelin web service spark aws emr and found error when running this code the output i expected is wordcount in afile in my s3 storage
text_file = sc.textFile("s3://mybuckettest2/Scenarios.txt")
counts = text_file.flatMap(lambda line: line.split(" ")) \
.map(lambda word: (word, 1)) \
.reduceByKey(lambda a, b: a + b)
counts.saveAsTextFile("s3://mybuckettest2/test.txt")
The error:
Traceback (most recent call last):
File "/tmp/zeppelin_python-2374039163027007666.py", line 319, in <module>
raise Exception(traceback.format_exc())
Exception: Traceback (most recent call last):
File "/tmp/zeppelin_python-2374039163027007666.py", line 307, in <module>
exec(code, _zcUserQueryNameSpace)
File "<stdin>", line 1, in <module>
NameError: name 'sc' is not defined
I found this from the documentation .
SparkContext, SQLContext and ZeppelinContext are automatically created and exposed as variable names sc, sqlContext and z, respectively, in Scala, Python and R environments. Staring from 0.6.1 SparkSession is available as variable spark when you are using Spark 2.x.
It means that the sc
is for scala and you have to use sqlContext
for pyspark.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.