简体   繁体   中英

spark python in zeppelin aws error running program

I had try example code about python in zeppelin web service spark aws emr and found error when running this code the output i expected is wordcount in afile in my s3 storage

text_file = sc.textFile("s3://mybuckettest2/Scenarios.txt")
counts = text_file.flatMap(lambda line: line.split(" ")) \
             .map(lambda word: (word, 1)) \
             .reduceByKey(lambda a, b: a + b)
counts.saveAsTextFile("s3://mybuckettest2/test.txt")

The error:

 Traceback (most recent call last):
  File "/tmp/zeppelin_python-2374039163027007666.py", line 319, in <module>
    raise Exception(traceback.format_exc())
Exception: Traceback (most recent call last):
  File "/tmp/zeppelin_python-2374039163027007666.py", line 307, in <module>
    exec(code, _zcUserQueryNameSpace)
  File "<stdin>", line 1, in <module>
NameError: name 'sc' is not defined

I found this from the documentation .

SparkContext, SQLContext and ZeppelinContext are automatically created and exposed as variable names sc, sqlContext and z, respectively, in Scala, Python and R environments. Staring from 0.6.1 SparkSession is available as variable spark when you are using Spark 2.x.

It means that the sc is for scala and you have to use sqlContext for pyspark.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM