简体繁体中英

Module not found in AWS EMR slave nodes

原文 2018-01-24 23:21:16 3 1 python/ amazon-web-services/ apache-spark/ pyspark/ emr

I am currently running spark-submit jobs on an AWS EMR cluster. I started running into python package issues where a module is not found in during imports.

One obvious solution would be to go into each individual node and install my dependencies. I would like to avoid this if possible. Another solution I can do is write a bootstrap script and create a new cluster.

Last solution that seems to work is I can also pip install my dependencies and zip them and pass them through the spark-submit job through --py-files . Though that may start becoming cumbersome as my requirements increase.

Any other suggestions or easy fixes I may be overlooking?

1 answers

bootstrap is the solution. write a shell script, pip install all your required packages and put it in the bootstrap option. It will be executed on all nodes when you create a cluster. just keep in mind that if the bootstrap takes too long time (1 hour or so?), it will fail.

AWS EMR - How to automatically edit file on all slave nodes?

Amazon EMR Pyspark Module not found

AWS EMR Spark “No Module named pyspark”

Spark execution time vs number of nodes on AWS EMR

Running Jupyter PySpark notebook in EMR, module not found, although it is installed

Python module not found AWS lambda

How can I make my python code run on the AWS slave nodes using Apache-Spark?

Emr Launch with AWS Lambda

Import module in MRJob on EMR

AWS Lambda - error on importing pymssql (module not found)

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question AWS EMR - How to automatically edit file on all slave nodes? Amazon EMR Pyspark Module not found AWS EMR Spark “No Module named pyspark” Spark execution time vs number of nodes on AWS EMR Running Jupyter PySpark notebook in EMR, module not found, although it is installed Python module not found AWS lambda How can I make my python code run on the AWS slave nodes using Apache-Spark? Emr Launch with AWS Lambda Import module in MRJob on EMR AWS Lambda - error on importing pymssql (module not found)

Related Tags

Module not found in AWS EMR slave nodes

Question

1 answers

solution1 0 2018-04-26 14:58:09

solution1
0 2018-04-26 14:58:09