简体   繁体   中英

Spark: memory issues (GC overhead limit exceeded) when using cogroup with ListBuffer in Scala

I have the following code:

 fTuple2.cogroup(gTuple2).flatMap { t =>

      val fList: ListBuffer[classF] = ListBuffer()
      val gList: ListBuffer[classG] = ListBuffer()

      while (t._2._2.iterator.hasNext) {
        gList.add(t._2._2.iterator.next)
      }

      val fIter = t._2._1.iterator
      while (fIter.hasNext) {
        val f = fIter.next
        val hn = f.getNum()

        //-----------------
        try {
          val gValue = FindGUtiltity.findBestG(hn, gList)
          f.setG(gValue)
        } catch {
          case e: Exception => println("exception caught: " + e);
        }
        fList.add(f)
      }
      fList
 }

and at line:

gList.add(t._2._2.iterator.next)

I got the following error:

java.lang.OutOfMemoryError: GC overhead limit exceeded
    at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
    at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:176)
    at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:45)
    at scala.collection.mutable.BufferLike$class.appendAll(BufferLike.scala:147)
    at scala.collection.mutable.AbstractBuffer.appendAll(Buffer.scala:48)
    at scala.collection.mutable.BufferLike$class.append(BufferLike.scala:142)
    at scala.collection.mutable.AbstractBuffer.append(Buffer.scala:48)
    at scala.collection.convert.Wrappers$MutableBufferWrapper.add(Wrappers.scala:80)

When the gList size is 1, it worked fine. But if the average gList size is ~ 5, the memory issues occurs. The total number of instances of classG is not too big, so the total gList shouldn't be too large. Is gList actually duplicating itself in Scala? Is there a better way to create List in Scala? or should I use some Java List here instead?

Thank you!

Your while loop will never end as long as there's at least one element to iterate over, because you are requesting a new iterator with every cycle and getting the first element every time, adding it to a list that blows the memory.

That's why (based on your comment) getting an iterator only once and assigning it to a val solved it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM