简体   繁体   English

Apache Spark中的Persist选项

[英]Persist option in Apache Spark

Hi I am new to Apache Spark and I am querying the hive tables using Apache spark sql in java. 嗨,我是Apache Spark的新手,我在Java中使用Apache Spark sql查询配置单元表。

And this is my code 这是我的代码

    SparkConf sparkConf = new 
SparkConf().setAppName("Hive").setMaster("local");   
   JavaSparkContext ctx = new JavaSparkContext(sparkConf);
    HiveContext sqlContext = new 
org.apache.spark.sql.hive.HiveContext(ctx.sc());
    org.apache.spark.sql.Row[] results = sqlContext.sql("Select * from 
Tablename where Column='Value'").collect();
    org.apache.spark.sql.Row[] results = sqlContext.sql("Select * from 
Tablename where Column='Value1'").collect();

Also I tried running two different queries in the same application and I watched it is making connections each time with hive meta store. 我也尝试在同一个应用程序中运行两个不同的查询,并且看到它每次都与Hive Meta Store建立连接。 How to solve this and also tell me how to use persist option efficiently. 如何解决这个问题,还告诉我如何有效使用persist选项。

It might help to call sqlContext.cacheTable("Tablename") before executing the two queries. 在执行两个查询之前,调用sqlContext.cacheTable("Tablename")可能会有所帮助。

According to the docs it does what you're looking for. 根据文档,它可以满足您的需求。

Caches the specified table in-memory. 将指定的表缓存在内存中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM