I'm trying to run the following Python script locally, using spark-submit command:
import sys
sys.path.insert(0, '.')
from pyspark import SparkContext, SparkConf
from commons.Utils import Utils
def splitComma(line):
splits = Utils.COMMA_DELIMITER.split(line)
return "{}, {}".format(splits[1], splits[2])
if __name__ == "__main__":
conf = SparkConf().setAppName("airports").setMaster("local[2]")
sc = SparkContext(conf = conf)
airports = sc.textFile("in/airports.text")
airportsInUSA = airports\
.filter(lambda line : Utils.COMMA_DELIMITER.split(line)[3] == "\"United States\"")
airportsNameAndCityNames = airportsInUSA.map(splitComma)
airportsNameAndCityNames.saveAsTextFile("out/airports_in_usa.text")
The command used (while inside the project directory):
spark-submit rdd/AirportsInUsaSolution.py
I keep getting this error:
Traceback (most recent call last): File "/home/gustavo/Documentos/TCC/python_spark_yt/python-spark-tutorial/rdd/AirportsInUsaSolution.py", line 4, in from commons.Utils import Utils ImportError: No module named commons.Utils
Even though there is a commons.Utils with a Utils class.
It seems that the only imports
it accepts are the ones from Spark, because this error persists when I try to import any other class or file from my project.
from pyspark import SparkContext, SparkConf
def splitComma(line):
splits = Utils.COMMA_DELIMITER.split(line)
return "{}, {}".format(splits[1], splits[2])
if __name__ == "__main__":
conf = SparkConf().setAppName("airports").setMaster("local[2]")
sc = SparkContext(conf = conf)
sc.addPyFile('.../pathto commons.zip')
from commons import Utils
airports = sc.textFile("in/airports.text")
airportsInUSA = airports\
.filter(lambda line : Utils.COMMA_DELIMITER.split(line)[3] == "\"United States\"")
airportsNameAndCityNames = airportsInUSA.map(splitComma)
airportsNameAndCityNames.saveAsTextFile("out/airports_in_usa.text")
Yes, it only accepts the ones from the Spark. You can zip the required files (Utils, numpy) etc and specify the parameter --py-files
in the spark-submit.
spark-submit --py-files rdd/file.zip rdd/AirportsInUsaSolution.py
for python to consider a directory as package you need to create __init__.py in that directory. The __init__.py file doesn't need to contain anything.
In this case once you create __init__.py in the commons directory you will be able to access that package.
Create a python script named: Utils.py
which will contain:
import re
class Utils():
COMMA_DELIMITER = re.compile(''',(?=(?:[^"]*"[^"]*")*[^"]*$)''')
Put this Utils.py
python script on a commons
folder and put this folder in your working directory (type pwd
to know it). You can then import the Utils
class:
from commons.Utils import Utils
Hope it will help you.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.