简体   繁体   中英

Apache Spark 1.2.1 standalone cluster giving java heap space error

I need information about, how to figure out how much heap space(memory) would be needed to operate on x mb(suppose x means 600 mb) in spark standalone cluster.

Scenario:

I have standalone cluster with 14gb memory and 8 cores. I want to operate(Reading data from files and writing it to Cassandra) on 600 MB of data.

For this task I have SparkConfig as:

.set("spark.cassandra.output.throughput_mb_per_sec","800")

.set("spark.storage.memoryFraction", "0.3")

And --executor-memory=5g --total-executor-cores 6 --driver-memory 6g at the time of submitting task.

In spite of above configuration,I getting java heap space error while writing data to Cassandra.

Below is the java code:

    public static void main(String[] args) throws Exception {
    String fileName = args[0];

    Long now = new Date().getTime();

    SparkConf conf = new SparkConf(true)
            .setAppName("JavaSparkSQL_" +now)
            .set("spark.cassandra.connection.host", "192.168.1.65")
            .set("spark.cassandra.connection.native.port", "9042")
            .set("spark.cassandra.connection.rpc.port", "9160")
            .set("spark.cassandra.output.throughput_mb_per_sec","800")
            .set("spark.storage.memoryFraction", "0.3");

    JavaSparkContext ctx = new JavaSparkContext(conf);


    JavaRDD<String> input =ctx.textFile    
("hdfs://abc.xyz.net:9000/figmd/resources/" + fileName, 12);
    JavaRDD<PlanOfCare> result = input.mapPartitions(new 
ParseJson()).filter(new PickInputData());

    System.out.print("Count --> "+result.count());
    System.out.println(StringUtils.join(result.collect(), ","));


 javaFunctions(result).writerBuilder("ks","pt_planofcarelarge",
 mapToRow(PlanOfCare.class)).saveToCassandra();

}

What configuration I am suppose to do?Am I missing anything? Thanks in advance.

JavaRDD collect method return an array that contains all of the elements in this RDD.

So in your case, it will creates an array with 340000 elements which will result in a Java Heap Error , you may want to take a small sample of your data and collect it or you may want to save it directly to your disk.

For more information about JavaRDD, you can always refer to the official documentation .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM