Apache Spark中的Persist选项

Question

Hi I am new to Apache Spark and I am querying the hive tables using Apache spark sql in java. 嗨，我是Apache Spark的新手，我在Java中使用Apache Spark sql查询配置单元表。

And this is my code 这是我的代码

    SparkConf sparkConf = new 
SparkConf().setAppName("Hive").setMaster("local");   
   JavaSparkContext ctx = new JavaSparkContext(sparkConf);
    HiveContext sqlContext = new 
org.apache.spark.sql.hive.HiveContext(ctx.sc());
    org.apache.spark.sql.Row[] results = sqlContext.sql("Select * from 
Tablename where Column='Value'").collect();
    org.apache.spark.sql.Row[] results = sqlContext.sql("Select * from 
Tablename where Column='Value1'").collect();

Also I tried running two different queries in the same application and I watched it is making connections each time with hive meta store. 我也尝试在同一个应用程序中运行两个不同的查询，并且看到它每次都与Hive Meta Store建立连接。 How to solve this and also tell me how to use persist option efficiently. 如何解决这个问题，还告诉我如何有效使用persist选项。

Answer 1

It might help to call sqlContext.cacheTable("Tablename") before executing the two queries. 在执行两个查询之前，调用sqlContext.cacheTable("Tablename")可能会有所帮助。

According to the docs it does what you're looking for. 根据文档，它可以满足您的需求。

Caches the specified table in-memory. 将指定的表缓存在内存中。

Apache Spark中的Persist选项

问题描述

1 个解决方案

解决方案1
1 已采纳 2015-07-27 07:07:14

Apache Spark中的Persist选项

问题描述

1 个解决方案

解决方案1 1 已采纳 2015-07-27 07:07:14

解决方案1
1 已采纳 2015-07-27 07:07:14