简体   繁体   中英

How do I add a Python module from inside conda's site-package directory to spark-submit?

I need to run a PySpark application (v1.6.3). There is the --py-files flag to add .zip, .egg, or .py files. If I had a Python package/module at /usr/anaconda2/lib/python2.7/site-packages/fuzzywuzzy , how would I include this whole module?

Inside this directory, I do notice some *.py and *.pyc files.

  • fuzz.py
  • process.py
  • StringMatcher.py
  • string_processing.py
  • utils.py

Would I have to include each of these one-by-one? For example.

spark-submit \
 --py-files /usr/anaconda2/lib/python2.7/site-packages/fuzzywuzzy/fuzz.py,/usr/anaconda2/lib/python2.7/site-packages/fuzzywuzzy/process.py,/usr/anaconda2/lib/python2.7/site-packages/fuzzywuzzy/StringMatcher.py,/usr/anaconda2/lib/python2.7/site-packages/fuzzywuzzy/string_processing.py,/usr/anaconda2/lib/python2.7/site-packages/fuzzywuzzy/utils.py

Is there an easier way?

  • should I try to find the .egg or .zip and use it (eg pypi )?
  • can I just zip up this directory and pass that in?

Any tips or pointers would be greatly appreciated. In reality, there are more Python modules managed by conda that I need.

I suggest doing it in other direction. Installing pyspark to Anaconda with:

conda install -c conda-forge pyspark=2.1.1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM