简体   繁体   中英

Issues with SparkSQL (Spark and Hive connectivity)

I am trying to retrieve data from a database made in Hive into my Spark and even if there's data in the DB (I checked it with Hive) doing a query with Spark returns no rows (it returns the column information though).

I have copied the hive-site.xml file into the Spark configuration folder (was asked for).

IMPORTS

import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.rdd.RDD
import org.apache.spark.sql
import org.apache.spark.storage.StorageLevel
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.hive.HiveContext

Creating a Spark session:

val spark = SparkSession.builder().appName("Reto").config("spark.sql.warehouse.dir", "hive_warehouse_hdfs_path").enableHiveSupport().getOrCreate() 
    spark.sql("show databases").show()

Getting data:

spark.sql("USE retoiabd")
val churn = spark.sql("SELECT count(*) FROM churn").show()

Output:

count(1) = 0

After checking it out with our teacher there was an issue with the creation of the tables themselves in Hive.

We created the table like this:

CREATE TABLE name (columns)

Instead of like this:

CREATE EXTERNAL TABLE name (columns)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM