使用Apache Commons lineIterator时出现OutOfMemory错误

Question

I'm trying to iterate line-by-line a 1.2GB file using Apache Commons FileUtils.lineIterator . 我正在尝试使用Apache Commons FileUtils.lineIterator逐行迭代1.2GB文件。 However, as soon as a LineIterator calls hasNext() I get a java.lang.OutOfMemoryError: Java heap space . 但是，只要LineIterator调用hasNext()我就会得到一个java.lang.OutOfMemoryError: Java heap space 。 I've already allocated 1G to the java heap. 我已经为Java堆分配了1G 。

What am I doing wrong in here? 我在这里做错了什么？ After reading some docs, isn't LineIterator supposed to be reading the file from the file system and not loading it into memory? 在阅读了一些文档之后，LineIterator是不是应该从文件系统中读取文件而不是将其加载到内存中？

Note the code is in Scala: 请注意代码在Scala中：

  val file = new java.io.File("data_export.dat")
  val it = org.apache.commons.io.FileUtils.lineIterator(file, "UTF-8")
  var successCount = 0L
  var totalCount = 0L
  try {
    while ( {
      it.hasNext()
    }) {
      try {
        val legacy = parse[LegacyEvent](it.nextLine())
        BehaviorEvent(legacy)
        successCount += 1L
      } catch {
        case e: Exception => println("Parse error")
      }
      totalCount += 1
    }
  } finally {
    it.close()
  }

Thanks for your help here! 谢谢你的帮助！

Answer 1

The code looks good. 代码看起来不错。 Probably it does not find an end of a line in the file and reads a very long line which is larger than 1Gb into memory. 可能它没有在文件中找到一行的结尾，并且在内存中读取一条大于1Gb的非常长的行。

Try wc -l in Unix and see how many lines you get. 在Unix中尝试wc -l ，看看你得到了多少行。

使用Apache Commons lineIterator时出现OutOfMemory错误

问题描述

1 个解决方案

解决方案1
5 已采纳 2014-06-20 19:47:38

使用Apache Commons lineIterator时出现OutOfMemory错误

问题描述

1 个解决方案

解决方案1 5 已采纳 2014-06-20 19:47:38

解决方案1
5 已采纳 2014-06-20 19:47:38