简体   繁体   中英

Persist option in Apache Spark

Hi I am new to Apache Spark and I am querying the hive tables using Apache spark sql in java.

And this is my code

    SparkConf sparkConf = new 
SparkConf().setAppName("Hive").setMaster("local");   
   JavaSparkContext ctx = new JavaSparkContext(sparkConf);
    HiveContext sqlContext = new 
org.apache.spark.sql.hive.HiveContext(ctx.sc());
    org.apache.spark.sql.Row[] results = sqlContext.sql("Select * from 
Tablename where Column='Value'").collect();
    org.apache.spark.sql.Row[] results = sqlContext.sql("Select * from 
Tablename where Column='Value1'").collect();

Also I tried running two different queries in the same application and I watched it is making connections each time with hive meta store. How to solve this and also tell me how to use persist option efficiently.

It might help to call sqlContext.cacheTable("Tablename") before executing the two queries.

According to the docs it does what you're looking for.

Caches the specified table in-memory.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM