简体   繁体   English

Spark失败了java.lang.OutOfMemoryError:超出了GC开销限制?

[英]Spark fails with java.lang.OutOfMemoryError: GC overhead limit exceeded?

This is my java code in which I am querying data from Hive using Apache spark sql. 这是我的java代码,我使用Apache spark sql从Hive查询数据。

JavaSparkContext ctx = new JavaSparkContext(new SparkConf().setAppName("LoadData").setMaster("MasterUrl"));
HiveContext sqlContext = new HiveContext(ctx.sc());
List<Row> result = sqlContext.sql("Select * from Tablename").collectAsList();

when I run this code it throws java.lang.OutOfMemoryError: GC overhead limit exceeded. 当我运行此代码时,它会抛出java.lang.OutOfMemoryError:超出GC开销限制。 How to solve this or how to increase the memory in Spark configuration. 如何解决此问题或如何增加Spark配置中的内存。

If you are using the spark-shell to run it then you can use the driver-memory to bump the memory limit: 如果您使用spark-shell运行它,那么您可以使用driver-memory来突破内存限制:

spark-shell --driver-memory Xg [other options]

If the executors are having problems then you can adjust their memory limits with --executor-memory XG 如果执行程序有问题,那么您可以使用--executor-memory XG调整其内存限制

You can find more info how to exactly set them in the guides: submission for executor memory, configuration for driver memory. 您可以在指南中找到有关如何准确设置它们的更多信息: 提交执行程序内存, 配置驱动程序内存。

@Edit: since you are running it from Netbeans you should be able to pass them as JVM arguments -Dspark.driver.memory=XG and -Dspark.executor.memory=XG . @Edit:因为你从Netbeans运行它,你应该能够将它们作为JVM参数-Dspark.driver.memory=XG-Dspark.executor.memory=XG I think it was in Project Properties under Run . 我认为它是在Run下的Project Properties

have you found any solutions for your issue yet? 你找到了解决问题的方法吗? please share them if you have :D 如果你有,请分享:D

and here is my idea: rdd and also javaRDD has a method toLocalIterator() , spark document said that 这里是我的想法:rdd和javaRDD都有一个方法toLocalIterator() ,spark文件说的那个

The iterator will consume as much memory as the largest partition in this RDD. 迭代器将消耗与此RDD中最大分区一样多的内存。

it means iterator will consume less memory than List if the rdd is devided into many partitions, you can try like this: 这意味着如果将rdd划分为多个分区,迭代器将比List消耗更少的内存,您可以尝试这样:

Iterator<Row> iter = sqlContext.sql("Select * from Tablename").javaRDD().toLocalIterator();
while (iter.hasNext()){
    Row row = iter.next();
    //your code here
}

ps: it's just an idea and i haven't tested it yet ps:这只是一个想法,我还没有测试过

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 SPARK SQL java.lang.OutOfMemoryError:超出GC开销限制 - SPARK SQL java.lang.OutOfMemoryError: GC overhead limit exceeded Java PreparedStatement java.lang.OutOfMemoryError:超出了GC开销限制 - Java PreparedStatement java.lang.OutOfMemoryError: GC overhead limit exceeded 詹金斯 java.lang.OutOfMemoryError:超出 GC 开销限制 - Jenkins java.lang.OutOfMemoryError: GC overhead limit exceeded java.lang.OutOfMemoryError:GC开销限制超出了android studio - java.lang.OutOfMemoryError: GC overhead limit exceeded android studio Gridgain:java.lang.OutOfMemoryError:超出了GC开销限制 - Gridgain: java.lang.OutOfMemoryError: GC overhead limit exceeded SonarQube java.lang.OutOfMemoryError:超出了GC开销限制 - SonarQube java.lang.OutOfMemoryError: GC overhead limit exceeded Tomcat java.lang.OutOfMemoryError:超出了GC开销限制 - Tomcat java.lang.OutOfMemoryError: GC overhead limit exceeded java.lang.OutOfMemoryError:超出 GC 开销限制 - java.lang.OutOfMemoryError: GC overhead limit exceeded 超出Junit java.lang.OutOfMemoryError GC开销限制 - Junit java.lang.OutOfMemoryError GC overhead limit exceeded 获取错误:java.lang.OutOfMemoryError:超出了GC开销限制 - Getting Error:java.lang.OutOfMemoryError: GC overhead limit exceeded
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM