简体   繁体   English

我应该在Java内存不足错误上增加哪种类型的Spark内存?

[英]Which type of Spark memory should I increase on Java out of memory error?

So, I have a pattern like shown below. 因此,我有一个如下所示的模式。

def someFunction(...) : ... = 
{
  // Somewhere here some large string (still < 1 GB) is made ...
  //  ... and sometimes I get Java.lang.OutOfMemoryError while building that string
}

....
val RDDb = RDDa.map(x => someFunction(...))

So, inside someFunction , at one place a large string is made, which is still not that big (< 1 GB), but I get java.lang.OutOfMemoryError: Java heap space error sometimes while building that string. 因此,在someFunction内部,在一个地方制作了一个很大的字符串,它仍然不是很大(<1 GB),但是我得到了java.lang.OutOfMemoryError: Java heap space构建该字符串时有时会出现java.lang.OutOfMemoryError: Java heap space错误。 This happens even when my executor memory is quite large (8 GB). 即使我的执行器内存很大(8 GB),也会发生这种情况。

According to this article , there is User memory and Spark memory. 根据本文 ,有用户内存和Spark内存。 Now in my case, which one's fraction should I increase, the User memory's or the Spark memory's? 现在,就我而言,我应该增加用户存储器的比例还是Spark存储器的比例?

PS: I am using Spark version 2.0 PS:我正在使用Spark版本2.0

1G raw string can use more than 8G memory easily. 1G原始字符串可以轻松使用8G以上的内存。 It's better to use streaming processing, like XMLEventReader for XML. 最好使用流处理,例如XMLEventReader for XML。

Ref to estimation in book Algorithm by Rober Sedgewick and Kevin Wayne. 参见Rober Sedgewick和Kevin Wayne在《算法》中的估算。 Each string has 56 bytes overhead. 每个字符串有56个字节的开销。 记忆估算

I wrote a simple test program and run with -Xmx8G 我编写了一个简单的测试程序,并使用-Xmx8G运行

object TestStringBuilder {
  val m = 1024 * 1024
  def memUsage(): Unit = {
    val runtime = Runtime.getRuntime

    println(
      s"""max: ${runtime.maxMemory() / m} M 
         |allocated: ${runtime.totalMemory() / m} M 
         |free: ${runtime.freeMemory() / m} M""".stripMargin)
  }

  def main(args: Array[String]): Unit = {
    val builder = new StringBuilder()
    val size = 10 * m
    try {
      while (true) {
        builder.append(Math.random())
        if (builder.length % size == 0) {
          println(s"len is ${builder.length / m} M")
          memUsage()
        }
      }
    }
    catch {
      case ex: OutOfMemoryError =>
        println(s"OutOfMemoryError len is ${builder.length/m} M")
        memUsage()
      case ex =>
        println(ex)
    }
  }
}

Output might be something like this. 输出可能是这样的。

len is 140 M
max: 7282 M allocated: 673 M free: 77 M
len is 370 M
max: 7282 M allocated: 2402 M free: 72 M
len is 470 M
max: 7282 M allocated: 1479 M free: 321 M
len is 720 M
max: 7282 M allocated: 3784 M free: 314 M
len is 750 M
max: 7282 M allocated: 3784 M free: 314 M
len is 1020 M
max: 7282 M allocated: 3784 M free: 307 M
OutOfMemoryError len is 1151 M
max: 7282 M allocated: 3784 M free: 303 M

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM