结合使用LD_PRELOAD和Apache Spark（或YARN）

Question

We are running Spark jobs on Apache Hadoop YARN. 我们正在Apache Hadoop YARN上运行Spark作业。 I have a special need to use the "LD_PRELOAD trick" on these jobs. 我特别需要在这些作业上使用“ LD_PRELOAD技巧”。 (Before anyone panics, it's not for production runs; this is part of automated job testing). （在出现任何恐慌之前，它不用于生产运行；这是自动化作业测试的一部分）。

I know how to submit additional files with the job, and I know how to set environment variables on the nodes, so adding these settings to spark-defaults.conf almost provides a solution: 我知道如何在作业中提交其他文件，并且知道如何在节点上设置环境变量，因此将这些设置添加到spark-defaults.conf几乎可以提供一种解决方案：

spark.files=/home/todd/pwn_connect.so
spark.yarn.appMasterEnv.LD_PRELOAD=pwn_connect.so
spark.executorEnv.LD_PRELOAD=pwn_connect.so

But I get this error in the container logs: 但是我在容器日志中得到了这个错误：

ERROR: ld.so: object 'pwn_connect.so' from LD_PRELOAD cannot be preloaded: ignored.

The problem seems to be that LD_PRELOAD doesn't accept the relative path that I'm providing. 问题似乎是LD_PRELOAD不接受我提供的相对路径。 But I don't know how to provide an absolute path -- I don't have a clue where on the local filesystem of the nodes these files are being placed. 但是我不知道如何提供绝对路径-我不知道这些文件在节点的本地文件系统上的放置位置。

Answer 1

Firstly, spark.files is not used when running on YARN, it should be spark.yarn.dist.files . 首先，在YARN上运行时不使用spark.files ，它应该是spark.yarn.dist.files 。 And note that this will be overwritten if the --files argument is provided to spark-submit . 并注意，如果将--files参数提供给spark-submit ，则它将被覆盖。

For LD_PRELOAD, there are two solutions that will work: 对于LD_PRELOAD，有两种解决方案可以使用：

Relative paths can be used; 可以使用相对路径； they need to be prefixed with ./ : 它们必须以./
```
 spark.yarn.dist.files=/home/todd/pwn_connect.so spark.yarn.appMasterEnv.LD_PRELOAD=./pwn_connect.so spark.executorEnv.LD_PRELOAD=./pwn_connect.so 
```
(relative paths without ./ are searched for in LD_LIBRARY_PATH , rather than the current working directory). （没有./相对路径是在LD_LIBRARY_PATH中而不是当前工作目录中搜索的）。
If an absolute path is preferred, examining the Spark source code reveals that the whole command line including environment variable assignments are subject to expansion by the shell, so the expression $PWD will be expanded to the current working directory: 如果首选绝对路径，则检查Spark源代码会发现，包括环境变量分配在内的整个命令行都将由shell进行扩展，因此表达式$PWD将扩展为当前工作目录：
```
 spark.yarn.dist.files=/home/todd/pwn_connect.so spark.yarn.appMasterEnv.LD_PRELOAD=$PWD/pwn_connect.so spark.executorEnv.LD_PRELOAD=$PWD/pwn_connect.so 
```

结合使用LD_PRELOAD和Apache Spark（或YARN）

问题描述

1 个解决方案

解决方案1
0 2017-11-10 00:15:47

结合使用LD_PRELOAD和Apache Spark（或YARN）

问题描述

1 个解决方案

解决方案1 0 2017-11-10 00:15:47

解决方案1
0 2017-11-10 00:15:47