[英]How to create an RDD directly from Hive table?
I am learning spark and creating rdd using the SparkContext object and using some local files, s3 and hdfs as follows: 我正在学习spark并使用SparkContext对象并使用一些本地文件s3和hdfs创建rdd,如下所示:
val lines = sc.textFile("file://../kv/mydata.log")
val lines = sc.textFile("s3n://../kv/mydata.log")
val lines = sc.textFile("hdfs://../kv/mydata.log")
Now i have some data in Hive tables. 现在我在Hive表中有一些数据。 Is it possible to load hive table's directly and use that data as an RDD?
是否可以直接加载配置单元表并将该数据用作RDD?
It can be done using the HiveContext as follows: 可以使用HiveContext如下进行:
val hiveContext = HiveContext(sc);
val rows = hiveContext.sql("Select name, age from students")
RDDs have now become obsolete. RDD现在已经过时了。 You can read the data directly from Hive tables to DataFrames using the new spark APIs.
您可以使用新的spark API将数据直接从Hive表读取到DataFrames。 Here's the link for Spark version 2.3.0 (change the version based on your installation.)
这是Spark版本2.3.0的链接(根据您的安装更改版本。)
https://spark.apache.org/docs/2.3.0/sql-programming-guide.html#hive-tables https://spark.apache.org/docs/2.3.0/sql-programming-guide.html#hive-tables
Here's a sample program. 这是一个示例程序。 You can store the result of the last line into a DataFrame and do all sorts of operation that you would normally do on an RDD like map, filter.
您可以将最后一行的结果存储到DataFrame中,并执行通常在RDD上执行的各种操作,例如映射,过滤器。
//Accessing Hive tables from Spark
import java.io.File
import org.apache.spark.sql.{Row, SaveMode, SparkSession}
case class People(name:String,age:Int,city:String,state:String,height:Double,weight:Double)
val warehouseLocation = new File("spark-warehouse").getAbsolutePath
val spark = SparkSession.builder.master("yarn").appName("My Hive
App").config("spark.sql.warehouse.dir", warehouseLocation)
.enableHiveSupport()
.getOrCreate()
import spark.implicits._
import spark.sql
sql("CREATE TABLE IF NOT EXISTS people(name String,age Int,city String,state String,height Double,weight Double) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','")
sql("LOAD DATA LOCAL INPATH 'file:/home/amalprakash32203955/data/people1.txt' INTO TABLE people")
sql("SELECT * FROM people").show()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.