简体   繁体   中英

How to connect to spark running within a docker instance

I'm trying to stand up Spark within a docker instance, then connect to it from an external python process.

Context: this setup is important for CI/CD of Spark-based code in Travis. I'm also hoping to use it to establish a consistent dev environment for a distributed team.

How do I do this?

This docker image has been lovely for spinning up spark: https://hub.docker.com/r/jupyter/pyspark-notebook/

Connecting via the dockerized notebook worked right out of the box. (Aside from debugging, I'm not actually using notebooks, so I might remove them later. For now, they're a good debugging tool.)

I haven't been able to connect from an external python process (notebook or otherwise.) Is there an environment variable that I need to set when I start python or instantiate my SparkContext?

Did you expose the spark ports correctly? Looking at the link you shared ( https://hub.docker.com/r/jupyter/pyspark-notebook/ ) I cannot make out how your are starting the containers. You need to expose the spark master port to the host and then use it from your python code. Can you share the command you are using to start the containers (or your docker-compose.yml). Also share the url you are using from python code.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM