[英]PySpark execute job in standalone mode but with user defined modules?
I have installed spark on some machine to use it in standalone cluster mode.我在某台机器上安装了 spark 以在独立集群模式下使用它。 So now I have some machines that have for each the same spark build version (Spark 2.4.0 build on hadoop 2.7+).
所以现在我有一些机器,每个机器都有相同的 Spark 构建版本(Spark 2.4.0 构建在 hadoop 2.7+ 上)。
I want to use this cluster for parallel data analysis and my language of run is Python so I'm using Pyspark not Spark.我想使用这个集群进行并行数据分析,我的运行语言是 Python,所以我使用的是 Pyspark 而不是 Spark。 I have created some modules of the operations to process the data and give it the form that I want.
我创建了一些操作模块来处理数据并为其提供我想要的形式。
However, I don't want to copy manually all this modules that I have created on every machine so I would like to know which option are in PySpark to pass the dependencies so that for every executor I'm sure that the modules are present?但是,我不想手动复制我在每台机器上创建的所有这些模块,所以我想知道 PySpark 中的哪个选项可以传递依赖项,以便对于每个执行程序,我确定这些模块都存在?
I have thought of virtual environments that will be activated and install the modules but I don't know how to do it in Spark Standalone mode, while in YARN manager seems to be this option, but I won't install YARN.我想过将激活并安装模块的虚拟环境,但我不知道如何在Spark Standalone模式下进行,而在YARN manager中似乎是这个选项,但我不会安装YARN。
Ps.附言。 Note: some module use data files like .txt and some dynamic libraries like .dll, .so and I want that they are passed to the executors to.
注意:某些模块使用 .txt 等数据文件和 .dll、.so 等一些动态库,我希望将它们传递给执行程序。
A good solution to distribute Spark and your modules is to use Docker Swarm (I hope you have experience with Docker).分发 Spark 和您的模块的一个很好的解决方案是使用Docker Swarm (我希望您有使用 Docker 的经验)。
Try to give a look at this repository, it was very useful for me at the time https://github.com/big-data-europe/docker-spark尝试看看这个存储库,当时它对我非常有用https://github.com/big-data-europe/docker-spark
It is a good base for distributing Spark.它是分发 Spark 的良好基础。 On top of that you can build your own modules.
最重要的是,您可以构建自己的模块。 So you create your personal Docker Images to distribute in your Docker Hub and then easily distribute them in your cluster with Docker Swarm
因此,您可以创建您的个人 Docker 映像以在您的 Docker Hub 中分发,然后使用 Docker Swarm 在您的集群中轻松分发它们
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.