简体   繁体   中英

Access hive table with json serde via spark sql

I am new to SPARK world. In what way, a hive table with JSON serde could be read via spark sql. Any example piece of code or document would work.

import org.apache.log4j.{Level, Logger}
import org.apache.spark.sql.SparkSession

object ReadJson {
  val spark = SparkSession // Building Spark object
    .builder()
    .appName("ReadJson")
    .master("local[*]")
    .config("spark.sql.shuffle.partitions","4") //Change to a more reasonable default number of partitions for our data
    .config("spark.app.id","RareJson") // To silence Metrics warning
    .getOrCreate()

  val sc = spark.sparkContext // Get the spark context

  val sqlContext = spark.sqlContext  // Get the spark Sql Context

  val input = "hdfs://user/..../..../..../file.json" //hdfs path to the file or directory

  def main(args: Array[String]): Unit = {

    Logger.getRootLogger.setLevel(Level.ERROR)  // application logs

    try {

      val jsonDf = sqlContext
        .read    
        .json(input) // reading the Json file and getting a DataFrame

      jsonDf.show(truncate = false) // showing some data in the console

      jsonDf.createOrReplaceTempView("my_table") // to work with SQL first we create a temporal view

      sqlContext.sql("""SELECT * FROM my_table""").show() //simple query

      // To have the opportunity to view the web console of Spark: http://localhost:4041/
      println("Type whatever to the console to exit......")
      scala.io.StdIn.readLine()
    } finally {
      sc.stop()
      println("SparkContext stopped")
      spark.stop()
      println("SparkSession stopped")
    }
  }
}

Spark programming guide

http://spark.apache.org/docs/2.3.0/sql-programming-guide.html#overview

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM