[英]How to connect to spark running within a docker instance
I'm trying to stand up Spark within a docker instance, then connect to it from an external python process. 我正在尝试在docker实例中站起来Spark,然后从外部python进程连接到它。
Context: this setup is important for CI/CD of Spark-based code in Travis. 上下文:此设置对于Travis中基于Spark的代码的CI / CD非常重要。 I'm also hoping to use it to establish a consistent dev environment for a distributed team.
我也希望使用它为分布式团队建立一致的开发环境。
How do I do this? 我该怎么做呢?
This docker image has been lovely for spinning up spark: https://hub.docker.com/r/jupyter/pyspark-notebook/ 这个docker镜像非常适合用来产生火花: https : //hub.docker.com/r/jupyter/pyspark-notebook/
Connecting via the dockerized notebook worked right out of the box. 通过dockerized笔记本进行连接可以立即使用。 (Aside from debugging, I'm not actually using notebooks, so I might remove them later. For now, they're a good debugging tool.)
(除了调试之外,我实际上并没有使用笔记本,因此以后可能会删除它们。目前,它们是一个很好的调试工具。)
I haven't been able to connect from an external python process (notebook or otherwise.) Is there an environment variable that I need to set when I start python or instantiate my SparkContext? 我无法从外部python进程(笔记本或其他方式)进行连接。启动python或实例化SparkContext时是否需要设置环境变量?
Did you expose the spark ports correctly? 您是否正确暴露了火花口? Looking at the link you shared ( https://hub.docker.com/r/jupyter/pyspark-notebook/ ) I cannot make out how your are starting the containers.
查看您共享的链接( https://hub.docker.com/r/jupyter/pyspark-notebook/ ),我无法确定您是如何启动容器的。 You need to expose the spark master port to the host and then use it from your python code.
您需要将spark主端口公开给主机,然后从python代码中使用它。 Can you share the command you are using to start the containers (or your docker-compose.yml).
您能否共享用于启动容器的命令(或docker-compose.yml)。 Also share the url you are using from python code.
同时分享您从python代码中使用的网址。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.