简体   繁体   中英

Populate a Properties Object from Spark Databricks File System

TL:DR

Is there a way to read a Scala/Java properties file from a Databricks file system?
Or, is there a way to convert a spark data frame Rows into a set of text key/value pairs (that Scala will understand)?

Full Problem:

The properties file is not local, it's on the Databricks cluster. Attempts to read a file from "dbfs:/" or "/dbfs" fail to find the file when using the scala.io.Source library. My guess is that Scala Source has no ability to recognize the URI for the Databricks file system(?).

I'm able to read the file into a Spark Dataframe however, but attempts to populate a java.utils.Properties object fail with an error that it doesn't accept the Spark Dataframe "ROW" type. I've tried changing the data frame to an Array and List, but run into the same type mismatch. java.util.List[org.apache.spark.sql.Row] for example, is what I get when converting the data frame to a list. I'm guessing that means dataFrameObject.collectAsList() makes a list of spark rows instead of a text list of key/value pairs.

Obviously I'm new to Scala... If there isn't a way to read/load my properties file directly from DBFS, is there a way to convert the spark Row to a key/value pairs - or a byteStream?

Cheers and thanks, Simon

If you're using full version of the Databricks, not community edition, then you should be able to access files on DBFS via /dbfs/_the_rest_of_your_path_without_dbfs:/_...

But if you can't access /dbfs/... , then you can still load properties as following:

  1. load the file into Spark using the text format that converts every line in the file into individual row
  2. create text from that rows - first you collect all rows to the driver node, then extract string from rows (using the .getString(0) to fetch first element of the row), and then merging all lines together using the mkString
  3. create reader for that text
  4. create properties object and load data from reader (don't forget to close reader after use):
val path_to_file = "dbfs:/something...."
val df = spark.read.format("text").load(path_to_file)
val allTextg = df.collect().map(_.getString(0)).mkString("\n")
val reader = new java.io.StringReader(allText)
val props = new java.util.Properties()
props.load(reader)
reader.close()

and you can check that properties are loaded with

props.list(System.out)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM