简体   繁体   中英

How to create an RDD directly from Hive table?

I am learning spark and creating rdd using the SparkContext object and using some local files, s3 and hdfs as follows:

val lines = sc.textFile("file://../kv/mydata.log")

val lines = sc.textFile("s3n://../kv/mydata.log")

val lines = sc.textFile("hdfs://../kv/mydata.log")

Now i have some data in Hive tables. Is it possible to load hive table's directly and use that data as an RDD?

It can be done using the HiveContext as follows:

val hiveContext = HiveContext(sc);
val rows = hiveContext.sql("Select name, age from students")

RDDs have now become obsolete. You can read the data directly from Hive tables to DataFrames using the new spark APIs. Here's the link for Spark version 2.3.0 (change the version based on your installation.)

https://spark.apache.org/docs/2.3.0/sql-programming-guide.html#hive-tables

Here's a sample program. You can store the result of the last line into a DataFrame and do all sorts of operation that you would normally do on an RDD like map, filter.

//Accessing Hive tables from Spark
import java.io.File
import org.apache.spark.sql.{Row, SaveMode, SparkSession}
case class People(name:String,age:Int,city:String,state:String,height:Double,weight:Double)
val warehouseLocation = new File("spark-warehouse").getAbsolutePath
val spark = SparkSession.builder.master("yarn").appName("My Hive 
 App").config("spark.sql.warehouse.dir", warehouseLocation)
  .enableHiveSupport()
  .getOrCreate()
import spark.implicits._
import spark.sql
sql("CREATE TABLE IF NOT EXISTS people(name String,age Int,city String,state String,height Double,weight Double)  ROW FORMAT DELIMITED FIELDS TERMINATED BY ','")
sql("LOAD DATA LOCAL INPATH 'file:/home/amalprakash32203955/data/people1.txt' INTO TABLE people")
sql("SELECT * FROM people").show()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM