如何直接从Hive表创建RDD？

Question

I am learning spark and creating rdd using the SparkContext object and using some local files, s3 and hdfs as follows: 我正在学习spark并使用SparkContext对象并使用一些本地文件s3和hdfs创建rdd，如下所示：

val lines = sc.textFile("file://../kv/mydata.log")

val lines = sc.textFile("s3n://../kv/mydata.log")

val lines = sc.textFile("hdfs://../kv/mydata.log")

Now i have some data in Hive tables. 现在我在Hive表中有一些数据。 Is it possible to load hive table's directly and use that data as an RDD? 是否可以直接加载配置单元表并将该数据用作RDD？

Answer 1

It can be done using the HiveContext as follows: 可以使用HiveContext如下进行：

val hiveContext = HiveContext(sc);
val rows = hiveContext.sql("Select name, age from students")

Answer 2

RDDs have now become obsolete. RDD现在已经过时了。 You can read the data directly from Hive tables to DataFrames using the new spark APIs. 您可以使用新的spark API将数据直接从Hive表读取到DataFrames。 Here's the link for Spark version 2.3.0 (change the version based on your installation.) 这是Spark版本2.3.0的链接（根据您的安装更改版本。）

https://spark.apache.org/docs/2.3.0/sql-programming-guide.html#hive-tables https://spark.apache.org/docs/2.3.0/sql-programming-guide.html#hive-tables

Here's a sample program. 这是一个示例程序。 You can store the result of the last line into a DataFrame and do all sorts of operation that you would normally do on an RDD like map, filter. 您可以将最后一行的结果存储到DataFrame中，并执行通常在RDD上执行的各种操作，例如映射，过滤器。

//Accessing Hive tables from Spark
import java.io.File
import org.apache.spark.sql.{Row, SaveMode, SparkSession}
case class People(name:String,age:Int,city:String,state:String,height:Double,weight:Double)
val warehouseLocation = new File("spark-warehouse").getAbsolutePath
val spark = SparkSession.builder.master("yarn").appName("My Hive 
 App").config("spark.sql.warehouse.dir", warehouseLocation)
  .enableHiveSupport()
  .getOrCreate()
import spark.implicits._
import spark.sql
sql("CREATE TABLE IF NOT EXISTS people(name String,age Int,city String,state String,height Double,weight Double)  ROW FORMAT DELIMITED FIELDS TERMINATED BY ','")
sql("LOAD DATA LOCAL INPATH 'file:/home/amalprakash32203955/data/people1.txt' INTO TABLE people")
sql("SELECT * FROM people").show()

如何直接从Hive表创建RDD？

问题描述

2 个解决方案

解决方案1
0 2019-03-07 17:14:36

解决方案2
0 2019-03-07 20:55:02

如何直接从Hive表创建RDD？

问题描述

2 个解决方案

解决方案1 0 2019-03-07 17:14:36

解决方案2 0 2019-03-07 20:55:02

解决方案1
0 2019-03-07 17:14:36

解决方案2
0 2019-03-07 20:55:02