[英]Using LD_PRELOAD with Apache Spark (or YARN)
We are running Spark jobs on Apache Hadoop YARN. 我们正在Apache Hadoop YARN上运行Spark作业。 I have a special need to use the "LD_PRELOAD trick" on these jobs.
我特别需要在这些作业上使用“ LD_PRELOAD技巧”。 (Before anyone panics, it's not for production runs; this is part of automated job testing).
(在出现任何恐慌之前,它不用于生产运行;这是自动化作业测试的一部分)。
I know how to submit additional files with the job, and I know how to set environment variables on the nodes, so adding these settings to spark-defaults.conf
almost provides a solution: 我知道如何在作业中提交其他文件,并且知道如何在节点上设置环境变量,因此将这些设置添加到
spark-defaults.conf
几乎可以提供一种解决方案:
spark.files=/home/todd/pwn_connect.so
spark.yarn.appMasterEnv.LD_PRELOAD=pwn_connect.so
spark.executorEnv.LD_PRELOAD=pwn_connect.so
But I get this error in the container logs: 但是我在容器日志中得到了这个错误:
ERROR: ld.so: object 'pwn_connect.so' from LD_PRELOAD cannot be preloaded: ignored.
The problem seems to be that LD_PRELOAD doesn't accept the relative path that I'm providing. 问题似乎是LD_PRELOAD不接受我提供的相对路径。 But I don't know how to provide an absolute path -- I don't have a clue where on the local filesystem of the nodes these files are being placed.
但是我不知道如何提供绝对路径-我不知道这些文件在节点的本地文件系统上的放置位置。
Firstly, spark.files
is not used when running on YARN, it should be spark.yarn.dist.files
. 首先,在YARN上运行时不使用
spark.files
,它应该是spark.yarn.dist.files
。 And note that this will be overwritten if the --files
argument is provided to spark-submit
. 并注意,如果将
--files
参数提供给spark-submit
,则它将被覆盖。
For LD_PRELOAD, there are two solutions that will work: 对于LD_PRELOAD,有两种解决方案可以使用:
Relative paths can be used; 可以使用相对路径; they need to be prefixed with
./
: 它们必须以
./
spark.yarn.dist.files=/home/todd/pwn_connect.so spark.yarn.appMasterEnv.LD_PRELOAD=./pwn_connect.so spark.executorEnv.LD_PRELOAD=./pwn_connect.so
(relative paths without ./
are searched for in LD_LIBRARY_PATH
, rather than the current working directory). (没有
./
相对路径是在LD_LIBRARY_PATH
中而不是当前工作目录中搜索的)。
If an absolute path is preferred, examining the Spark source code reveals that the whole command line including environment variable assignments are subject to expansion by the shell, so the expression $PWD
will be expanded to the current working directory: 如果首选绝对路径,则检查Spark源代码会发现,包括环境变量分配在内的整个命令行都将由shell进行扩展,因此表达式
$PWD
将扩展为当前工作目录:
spark.yarn.dist.files=/home/todd/pwn_connect.so spark.yarn.appMasterEnv.LD_PRELOAD=$PWD/pwn_connect.so spark.executorEnv.LD_PRELOAD=$PWD/pwn_connect.so
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.