简体   繁体   中英

create view for two different dataframe in scala spark

I have a code snippet that will read a Json array of the file path and then union the output and gives me two different tables. So I want to create two different createOrReplaceview(name) for those two tables and the name will be available in json array like below:

      {
        "source": [
            {
                "name": "testPersons",
                "data": [
                "E:\\dataset\\2020-05-01\\",
                "E:\\dataset\\2020-05-02\\"
                ],
                "type": "json"
            },
            {
                "name": "testPets",
                "data": [
                "E:\\dataset\\2020-05-01\\078\\",
                "E:\\dataset\\2020-05-02\\078\\"
                ],
                "type": "json"
            }
        ]
    }

My output:

testPersons
        +---+------+
        |name  |age|
        +---+------+
        |John  |24 |
        |Cammy |20 |
        |Britto|30 |
        |George|23 |
        |Mikle |15 |
        +---+------+
 testPets
        +---+------+
        |name  |age|
        +---+------+
        |piku  |2  |
        |jimmy |3  |
        |rapido|1  |
        +---+------+

Above is my Output and Json array my code iterate through each array and read the data section and read the data. But how to change my below code to create a temp view for each output table. for example i want to create .createOrReplaceTempView(testPersons) and .createOrReplaceTempView(testPets) view name as per in Json array

if (dataArr(counter)("type").value.toString() == "json") {
          val name = dataArr(counter)("name").value.toString()
          val dataPath = dataArr(counter)("data").arr
          val input = dataPath.map(item => {
            val rdd = spark.sparkContext.wholeTextFiles(item.str).map(i => "[" + i._2.replaceAll("\\}.*\n{0,}.*\\{", "},{") + "]")
            spark
              .read
              .schema(Schema.getSchema(name))
              .option("multiLine", true)
              .json(rdd)
          })
          val emptyDF = spark.createDataFrame(spark.sparkContext.emptyRDD[Row], Schema.getSchema(name))
          val finalDF = input.foldLeft(emptyDF)((x, y) => x.union(y))
          finalDF.show()

Expected output:

 spark.sql("SELECT * FROM testPersons").show()
 spark.sql("SELECT * FROM testPets").show()

It should give me the table for each one.

Since you already have your data wrangled into shape and have your rows in DataFrame s and simply want to access them as temporary views, I suppose you are looking for the function(s):

They can be invoked from a DataFrame/Dataset.

df.createOrReplaceGlobalTempView("testPersons")
spark.sql("SELECT * FROM global_temp.testPersons").show()

df.createOrReplaceTempView("testPersons")
spark.sql("SELECT * FROM testPersons").show()

For an explanation about the difference between the two, you can take a look at this question .


If you are trying to dynamically read the JSON, get the files in data into DataFrame s and then save them into their own table.

import net.liftweb.json._
import net.liftweb.json.DefaultFormats

case class Source(name: String, data: List[String], `type`: String)

val file = scala.io.Source.fromFile("path/to/your/file").mkString
implicit val formats: DefaultFormats.type = DefaultFormats
val json = parse(file)
val sourceList = (json \ "source").children
for (source <- sourceList) {
  val s = source.extract[Source]
  val df = s.data.map(d => spark.read(d)).reduce(_ union _)
  df.createOrReplaceTempView(s.name)
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM