I've installed Datastax enterprise 4.6 in a cluster and I can't figure out why the pyspark throw this error. The scala interface works nicely but the python doesn't. Does anyone have a clue how to fix this?
Python 2.6.6 Centos 6.5
Cheers
bash-4.1$ dse pyspark --master spark://IP:7077
Python 2.6.6 (r266:84292, Jan 22 2014, 01:49:05)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Traceback (most recent call last):
File "/usr/share/dse/spark/python/pyspark/shell.py", line 33, in <module>
import pyspark
File "/usr/share/dse/spark/python/pyspark/__init__.py", line 63, in <module>
from pyspark.context import SparkContext
File "/usr/share/dse/spark/python/pyspark/context.py", line 34, in <module>
from pyspark import rdd
File "/usr/share/dse/spark/python/pyspark/rdd.py", line 1972
return {convertColumnValue(v) for v in columnValue}
^
SyntaxError: invalid syntax
>>>
The PySpark support included in DSE 4.6 requires Python 2.7.x and will throw that error you're seeing on Python 2.6.x. An upcoming patch release should fix the problem with Python 2.6.x. There is not a specific date yet.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.