简体   繁体   English

如何从 hdfs 上的文件加载类型安全的 configFactory?

[英]How to load typesafe configFactory from file on hdfs?

I am using typesafe ConfigFactory to load the config into my scala application.我正在使用类型安全的 ConfigFactory 将配置加载到我的 scala 应用程序中。 I do not want to include the config files into my jar, but load them instead from an external hdfs filesystem.我不想将配置文件包含到我的 jar 中,而是从外部 hdfs 文件系统加载它们。 However, I cannot find a simple way to load the config from the fsDataInputStream object I get from hadoop:但是,我找不到从 hadoop 获得的 fsDataInputStream object 加载配置的简单方法:

//get HDFS file
val hadoopConfig: Configuration = sc.hadoopConfiguration
val fs: FileSystem = org.apache.hadoop.fs.FileSystem.get(hadoopConfig)
val file: FSDataInputStream = fs.open(new Path("hdfs://SOME_URL/application.conf"))
//read config from hdfs
val config: Config = ConfigFactory.load(file.readUTF())

However, this throws an EOFException.但是,这会引发 EOFException。 Is there an easy way to convert the FSDataInputStream object into the required java.io.File?有没有一种简单的方法可以将 FSDataInputStream object 转换为所需的 java.io.File? I found Converting from FSDataInputStream to FileInputStream , but this would be pretty cumbersome for such a simple task.我找到了 Converting from FSDataInputStream to FileInputStream ,但是对于这样一个简单的任务来说这会非常麻烦。

Using ConfigFactory.parseReader should work (but I haven't tested it): 使用ConfigFactory.parseReader应该可以工作(但我尚未测试过):

val reader = new InputStreamReader(file)
val config = try {
  ConfigFactory.parseReader(reader)
} finally {
  reader.close()
}

I could fix the issue with below code.我可以用下面的代码解决这个问题。 Assume configPath is the path in HDFS location, where you have the.conf file available.假设 configPath 是 HDFS 位置中的路径,您可以在其中使用 .conf 文件。 for Ex:- hdfs://mount-point/abc/xyz/details.conf例如:- hdfs://mount-point/abc/xyz/details.conf

import java.io.File
import com.typesafe.config._
import org.apache.hadoop.fs.{FileSystem, Path}
import java.io.InputStreamReader
val configPath = "hdfs://sparkmainserver:8020/file.conf"
val fs = FileSystem.get(new org.apache.hadoop.conf.Configuration())
val reader = new InputStreamReader(fs.open(new Path(configPath)))
val config: Config = ConfigFactory.parseReader(reader)

Then you can use config.getString("variable_name") to extract and use the variables/parameters.然后你可以使用 config.getString("variable_name") 来提取和使用变量/参数。 Prior to this you should have ConfigFactory sbt/Maven dependency in your pom file.在此之前,您的 pom 文件中应该有 ConfigFactory sbt/Maven 依赖项。

Here is what I did with Spark application: 这是我对Spark应用程序所做的操作:

  /**
    * Load typesafe's configuration from hdfs file location
    * @param sparkContext
    * @param confHdfsFileLocation
    * @return
    */
  def loadHdfsConfig(sparkContext: SparkContext, confHdfsFileLocation: String) : Config = {
    // Array of 1 element (fileName, fileContent)
    val appConf: Array[(String, String)] = sparkContext.wholeTextFiles(confHdfsFileLocation).collect()
    val appConfStringContent = appConf(0)._2
    ConfigFactory.parseString(appConfStringContent)
  }

Now in the code, just use 现在在代码中,只需使用

val config = loadHdfsConfig(sparkContext, confHdfsFileLocation)
config.getString("key-here")

I hope it helps. 希望对您有所帮助。

You should be able to load .conf file in hdfs using the following code: 您应该可以使用以下代码在hdfs中加载.conf文件:

ConfigFactory.parseFile(new File("application.conf"));

Please keep in mind that the .conf file should be placed on the same directory as your app file (eg jar file in spark). 请记住,.conf文件应与您的应用程序文件放在同一目录(例如,spark中的jar文件)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM