Hi I am new to Apache Spark and I am querying the hive tables using Apache spark sql in java.
And this is my code
SparkConf sparkConf = new
SparkConf().setAppName("Hive").setMaster("local");
JavaSparkContext ctx = new JavaSparkContext(sparkConf);
HiveContext sqlContext = new
org.apache.spark.sql.hive.HiveContext(ctx.sc());
org.apache.spark.sql.Row[] results = sqlContext.sql("Select * from
Tablename where Column='Value'").collect();
org.apache.spark.sql.Row[] results = sqlContext.sql("Select * from
Tablename where Column='Value1'").collect();
Also I tried running two different queries in the same application and I watched it is making connections each time with hive meta store. How to solve this and also tell me how to use persist option efficiently.
It might help to call sqlContext.cacheTable("Tablename")
before executing the two queries.
According to the docs it does what you're looking for.
Caches the specified table in-memory.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.