简体   繁体   English

如何加载java属性文件并在Spark中使用?

[英]How to load java properties file and use in Spark?

I want to store the Spark arguments such as input file, output file into a Java property files and pass that file into Spark Driver. 我想将Spark参数(如输入文件,输出文件)存储到Java属性文件中,并将该文件传递给Spark Driver。 I'm using spark-submit for submitting the job but couldn't find a parameter to pass the properties file. 我使用spark-submit提交作业但找不到传递属性文件的参数。 Have you got any suggestions? 你有什么建议吗?

here i found one solution: 在这里,我找到了一个解

props file : (mypropsfile.conf) // note: prefix your key with "spark." props文件 :(mypropsfile.conf)// 注意:用“spark”键前缀你的键。 else props will be ignored. 否则道具将被忽略。

spark.myapp.input /input/path
spark.myapp.output /output/path

launch 发射

$SPARK_HOME/bin/spark-submit --properties-file  mypropsfile.conf

how to call in code :( inside code) 如何调用代码 :(内部代码)

sc.getConf.get("spark.driver.host")  // localhost
sc.getConf.get("spark.myapp.input")       // /input/path
sc.getConf.get("spark.myapp.output")      // /output/path

The previous answer's approach has the restriction that is every property should start with spark in property file- 前一个答案的方法有限制,即每个属性应该从属性文件中的spark开始 -

eg 例如

spark.myapp.input spark.myapp.input
spark.myapp.output spark.myapp.output

If suppose you have a property which doesn't start with spark : 如果假设你有一个不以spark开头的属性:

job.property: job.property:

app.name=xyz app.name = XYZ

$SPARK_HOME/bin/spark-submit --properties-file  job.property

Spark will ignore all properties doesn't have prefix spark. Spark会忽略所有没有前缀spark.属性spark. with message: 有消息:

Warning: Ignoring non-spark config property: app.name=test 警告:忽略非spark配置属性:app.name = test

How I manage property file in application's driver and executor: 我如何在应用程序的驱动程序和执行程序中管理属性文件:

${SPARK_HOME}/bin/spark-submit --files job.properties

Java code to access the cache file (job.properties) : 用于访问缓存文件的Java代码(job.properties)

import java.util.Properties;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.spark.SparkFiles;
import java.io.InputStream;
import java.io.FileInputStream;

//Load file to propert object using HDFS FileSystem
String fileName = SparkFiles.get("job.properties")
Configuration hdfsConf = new Configuration();
FileSystem fs = FileSystem.get(hdfsConf);

//THe file name contains absolute path of file
FSDataInputStream is = fs.open(new Path(fileName));

// Or use java IO
InputStream is = new FileInputStream("/res/example.xls");

Properties prop = new Properties();
//load properties
prop.load(is)
//retrieve properties
prop.getProperty("app.name");

If you have environment specific properties (dev/test/prod) then supply APP_ENV custom java environment variable in spark-submit : 如果您有特定于环境的属性(dev/test/prod)那么在spark-submit提供APP_ENV自定义java环境变量:

${SPARK_HOME}/bin/spark-submit --conf \
"spark.driver.extraJavaOptions=-DAPP_ENV=dev spark.executor.extraJavaOptions=-DAPP_ENV=dev" \
--properties-file  dev.property

Replace your driver or executor code: 替换您的驱动程序或执行程序代码:

//Load file to propert object using HDFS FileSystem
String fileName = SparkFiles.get(System.getProperty("APP_ENV")+".properties")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM