如何使用 SPARK 在 HIVE 上进行查询？

Question

I am trying to use spark to run queries on hive table.我正在尝试使用 spark 在 hive 表上运行查询。 I have followed lots of articles present on internet, but had no success.我关注了互联网上的很多文章，但都没有成功。 I have moved the hive-site.xml file to spark location.我已将 hive-site.xml 文件移至 spark 位置。

Could you please explain how to do that?你能解释一下怎么做吗？ I am using Spark 1.6我正在使用 Spark 1.6

Thank you in advance.先感谢您。

Please find my code below.请在下面找到我的代码。

import sqlContext.implicits._
import org.apache.spark.sql
val eBayText = sc.textFile("/user/cloudera/spark/servicesDemo.csv")
val hospitalDataText = sc.textFile("/user/cloudera/spark/servicesDemo.csv")
val header = hospitalDataText.first()
val hospitalData = hospitalDataText.filter(a=>a!=header)
case class Services(uhid:String,locationid:String,doctorid:String)
val hData = hospitalData.map(_.split(",")).map(p=>Services(p(0),p(1),p(2)))
val hosService = hData.toDF()
hosService.write.format("parquet").mode(org.apache.spark.sql.SaveMode.Append).save("/user/hive/warehouse/hosdata")

This code created 'hosdata' folder at specified path, which contains data in 'parquet' format.此代码在指定路径创建了“hosdata”文件夹，其中包含“parquet”格式的数据。

But when i went to hive and check table got created or not the, i did not able to see any table name as 'hosdata'.但是当我去 hive 并检查表是否已创建时，我无法看到任何表名是“hosdata”。

So i run below commands.所以我运行下面的命令。

hosService.write.mode("overwrite").saveAsTable("hosData")
sqlContext.sql("show tables").show

shows me below result显示我下面的结果

+--------------------+-----------+
|           tableName|isTemporary|
+--------------------+-----------+
|             hosdata|      false|
+--------------------+-----------+

But again when i check in hive, i can not see table 'hosdata'但是当我再次检查 hive 时，我看不到表 'hosdata'

Could anyone let me know what step i am missing?谁能让我知道我错过了哪一步？

Answer 1

There are multiple ways you can use to query Hive using Spark.您可以通过多种方式使用 Spark 查询 Hive。

Like in Hive CLI, you can query using Spark SQL就像在 Hive CLI 中一样，您可以使用 Spark SQL 进行查询
Spark-shell is available to run spark class files in which you need to define variable like for hive, spark configuration object. Spark-shell 可用于运行 spark 类文件，您需要在其中定义变量，例如 hive、spark 配置对象。 Spark Context-sql() method allows you to execute the same query that you might have executed on Hive Spark Context-sql() 方法允许您执行可能在 Hive 上执行的相同查询

Performance tuning is definitely an important perspect as you can use broadcast and other methods for faster execution.性能调优绝对是一个重要的观点，因为您可以使用广播和其他方法来加快执行速度。

Hope this helps.希望这可以帮助。

如何使用 SPARK 在 HIVE 上进行查询？

问题描述

1 个解决方案

解决方案1
0 2018-11-15 07:43:53

如何使用 SPARK 在 HIVE 上进行查询？

问题描述

1 个解决方案

解决方案1 0 2018-11-15 07:43:53

解决方案1
0 2018-11-15 07:43:53