读取文本文件导致OutOfMemoryError

Question

I need to parse a csv file at work. 我需要在工作中解析一个csv文件。 Each line in the file is not very long, only a few hundred characters. 文件中的每一行不是很长，只有几百个字符。 I used the following code to read the file into memory. 我使用以下代码将文件读入内存。

def lines = []
new File( fileName ).eachLine { line -> lines.add( line ) }

When the number of lines is 10,000, the code works just fine. 当行数为10,000时，代码可以正常工作。 However, when I increase the number of lines to 100,000. 但是，当我将行数增加到100,000时。 I got this error: 我收到此错误：

java.lang.OutOfMemoryError: Java heap space

For 10,000 lines, the file size is about 7 MB, and ~70 MB for 100,000 lines. 对于10,000行，文件大小约为7 MB，对于100,000行，文件大小约为70 MB。 So, how would you solve this problem? 那么，您将如何解决这个问题？ I know increasing the heap size is a work-around. 我知道增加堆大小是一种解决方法。 But are there any other solutions? 但是还有其他解决方案吗？ Thank you in advance. 先感谢您。

Answer 1

def lines = []

In groovy, this creates an ArrayList<E> with size 0 and no preallocation of the internal Object[] . 在groovy中，这将创建一个ArrayList<E> ，其大小为0，并且没有内部Object[]预分配。

When adding items, if capacity is reached, a new ArrayList is created. 添加项目时，如果达到容量，则会创建一个新的ArrayList 。 The larger the list, the more time spent reallocating a new list to accommodate new entries. 列表越大，重新分配新列表以容纳新条目所花费的时间就越多。 I suspect that's where your memory issue occurs because, although I'm not exactly sure how ArrayList allocates a new list, if you're getting OOM for a relatively small data set, that's where I'd look first. 我怀疑那是您的内存问题发生的原因，因为虽然我不确定ArrayList如何分配新列表，但是如果您要为相对较小的数据集获取OOM，那是我首先要看的地方。 For 100,000 entries, you create a new list roughly 29 times ( assuming expansion factor of 1.5 ) when starting with an empty ArrayList . 对于100,000个条目，从一个空的ArrayList开始，您将创建一个大约29倍的新列表（假设扩展因子为1.5 ）。

If you have a general idea how large the list needs to be, just set the initial capacity, doing so avoids all the reallocating nonsense; 如果您大致知道列表需要多大，只需设置初始容量即可，这样可以避免所有的重新分配废话； see if this works: 看看是否可行：

def lines = new ArrayList<String>(100000)

Answer 2

Assuming that you are likely trying to place the CSV file in a database you can do something like this. 假设您可能尝试将CSV文件放入数据库中，则可以执行以下操作。 The key groovy feature is splitEachLine(yourDelimiter) and using the fields array in the closure. 关键特性是splitEachLine（yourDelimiter）并在闭包中使用fields数组。

import groovy.sql.*

def sql = Sql.newInstance("jdbc:oracle:thin:@localhost:1521:ORCL",
    "scott", "tiger", "oracle.jdbc.driver.OracleDriver")

//define a variable that matches a table definition (jdbc dataset
def student = sql.dataSet("TEMP_DATA");
//now iterate over the csv file splitting each line on commas and load the into table.
new File("C:/temp/file.csv").splitEachLine(","){ fields ->
//insert each column we have into the temp table.
 student.add(
        STUDENT_ID:fields[0],
        FIRST_NAME:fields[1],
        LAST_NAME:fields[2]
    )
}
//yes the magic has happened the data is now in the staging table TEMP_DATA.
println "Number of Records  " + sql.firstRow("Select count(*) from TEMP_DATA")

读取文本文件导致OutOfMemoryError

问题描述

2 个解决方案

解决方案1
1 已采纳 2013-08-26 20:08:34

解决方案2
0 2013-08-29 14:28:12

读取文本文件导致OutOfMemoryError

问题描述

2 个解决方案

解决方案1 1 已采纳 2013-08-26 20:08:34

解决方案2 0 2013-08-29 14:28:12

解决方案1
1 已采纳 2013-08-26 20:08:34

解决方案2
0 2013-08-29 14:28:12