简体   繁体   English


[英]Why does spark-submit in YARN cluster mode not find python packages on executors?

I am running a boo.py script on AWS EMR using spark-submit (Spark 2.0). 我正在使用spark-submit (Spark 2.0)在AWS EMR上运行boo.py脚本。

The file finished successfully when I use 我使用时文件成功完成

python boo.py

However, it failed when I run 但是,我跑步时失败了

spark-submit --verbose --deploy-mode cluster --master yarn  boo.py

The log on yarn logs -applicationId ID_number shows: yarn logs -applicationId ID_number上的yarn logs -applicationId ID_number显示:

Traceback (most recent call last):
File "boo.py", line 17, in <module>
import boto3
ImportError: No module named boto3

The python and boto3 module I am using is 我正在使用的pythonboto3模块是

$ which python
$ pip install boto3
Requirement already satisfied (use --upgrade to upgrade): boto3 in /usr/local/lib/python2.7/site-packages

How do I append this library path so that spark-submit could read the boto3 module? 如何附加此库路径,以便spark-submit可以读取boto3模块?

When you are running spark, part of the code is running on the driver, and part is running on the executors. 当您运行spark时,部分代码在驱动程序上运行,部分代码在执行程序上运行。

Did you install boto3 on the driver only, or on driver + all executors (nodes) which might run your code? 您仅在驱动程序上安装了boto3,还是在驱动程序+可能运行您的代码的所有执行程序(节点)上安装了boto3?

One solution might be - to install boto3 on all executors (nodes) 一种解决方案可能是-在所有执行程序(节点)上安装boto3

how to install python modules on Amazon EMR nodes : 如何在Amazon EMR节点上安装python模块

How to bootstrap installation of Python modules on Amazon EMR? 如何在Amazon EMR上引导Python模块的安装?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM