I have been working in spark (hadoop 2.7 based) with Python on eclipse and I am trying to run the sample example "word count" and it is my code : # Imports # Take care about unused imports (and also unused variables), # please comment them all, otherwise, you will get any errors at the execution. # Note that neither the directives "@PydevCodeAnalysisIgnore" nor "@UnusedImport" # will be able to solve that issue. #from pyspark.mllib.clustering import KMeans from pyspark import SparkConf, SparkContext import os
# Configure the Spark environment
sparkConf = SparkConf().setAppName("WordCounts").setMaster("local")
sc = SparkContext(conf = sparkConf)
# The WordCounts Spark program
textFile = sc.textFile(os.environ["SPARK_HOME"] + "/README.md")
wordCounts = textFile.flatMap(lambda line: line.split()).map(lambda word: (word, 1)).reduceByKey(lambda a, b: a+b)
for wc in wordCounts.collect(): print wc
and then i got the following errors :
17/08/07 12:28:13 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/08/07 12:28:16 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
Traceback (most recent call last):
File "/home/hduser/eclipse-workspace/PythonSpark/src/WordCounts.py", line 12, in <module>
sc = SparkContext(conf = sparkConf)
File "/usr/local/spark/python/pyspark/context.py", line 118, in __init__
conf, jsc, profiler_cls)
File "/usr/local/spark/python/pyspark/context.py", line 186, in _do_init
self._accumulatorServer = accumulators._start_update_server()
File "/usr/local/spark/python/pyspark/accumulators.py", line 259, in _start_update_server
server = AccumulatorServer(("localhost", 0), _UpdateRequestHandler)
File "/usr/lib/python2.7/SocketServer.py", line 417, in __init__
self.server_bind()
File "/usr/lib/python2.7/SocketServer.py", line 431, in server_bind
self.socket.bind(self.server_address)
File "/usr/lib/python2.7/socket.py", line 228, in meth
return getattr(self._sock,name)(*args)
socket.gaierror: [Errno -3] Temporary failure in name resolution
any help ?? I can run any project of spark with Scala using spark-shell and also any (non spark ) python program on eclipse with no errors i think my problem is with pyspark any things to do ??
You could Try this ,Just Create SparkContext is enough ,its working.
sc = SparkContext()
# The WordCounts Spark program
textFile = sc.textFile("/home/your/path/Test.txt")// OR on File-->right click get the path paste here
wordCounts = textFile.flatMap(lambda line: line.split()).map(lambda word: (word, 1)).reduceByKey(lambda a, b: a+b)
for wc in wordCounts.collect():
print wc
Try This way...
After started your spark it shows on COMMAND PROMPT sc as SparkContext.
If not available you can use following way..
>>sc=new org.apache.spark.SparkContext()
>>NOW YOU CAN USE...sc
This is enough to run your program. Because, sc available your Shell.
First try this your SHEEL MODE...
by line by line...
textFile = sc.textFile("/home/your/path/Test.txt")// OR on File-->right click get the path paste here
wordCounts = textFile.flatMap(lambda line: line.split()).map(lambda word: (word, 1)).reduceByKey(lambda a, b: a+b)
for wc in wordCounts.collect():
print wc
As per my understanding the below code should work if Spark is installed properly.
from pyspark import SparkConf, SparkContext
conf = SparkConf().setMaster("local").setAppName("WordCount")
sc = SparkContext(conf = conf)
input = sc.textFile("file:///sparkcourse/PATH_NAME")
words = input.flatMap(lambda x: x.split())
wordCounts = words.countByValue()
for word, count in wordCounts.items():
cleanWord = word.encode('ascii', 'ignore')
if (cleanWord):
print(cleanWord.decode() + " " + str(count))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.