Persist option in Apache Spark

Question

Hi I am new to Apache Spark and I am querying the hive tables using Apache spark sql in java.

And this is my code

    SparkConf sparkConf = new 
SparkConf().setAppName("Hive").setMaster("local");   
   JavaSparkContext ctx = new JavaSparkContext(sparkConf);
    HiveContext sqlContext = new 
org.apache.spark.sql.hive.HiveContext(ctx.sc());
    org.apache.spark.sql.Row[] results = sqlContext.sql("Select * from 
Tablename where Column='Value'").collect();
    org.apache.spark.sql.Row[] results = sqlContext.sql("Select * from 
Tablename where Column='Value1'").collect();

Also I tried running two different queries in the same application and I watched it is making connections each time with hive meta store. How to solve this and also tell me how to use persist option efficiently.

Answer 1

It might help to call sqlContext.cacheTable("Tablename") before executing the two queries.

According to the docs it does what you're looking for.

Caches the specified table in-memory.

Persist option in Apache Spark

Question

1 answers

solution1
1 ACCPTED 2015-07-27 07:07:14

Persist option in Apache Spark

Question

1 answers

solution1 1 ACCPTED 2015-07-27 07:07:14

solution1
1 ACCPTED 2015-07-27 07:07:14