简体   繁体   English

读取文本文件导致OutOfMemoryError

[英]OutOfMemoryError from reading a text file

I need to parse a csv file at work. 我需要在工作中解析一个csv文件。 Each line in the file is not very long, only a few hundred characters. 文件中的每一行不是很长,只有几百个字符。 I used the following code to read the file into memory. 我使用以下代码将文件读入内存。

def lines = []
new File( fileName ).eachLine { line -> lines.add( line ) }

When the number of lines is 10,000, the code works just fine. 当行数为10,000时,代码可以正常工作。 However, when I increase the number of lines to 100,000. 但是,当我将行数增加到100,000时。 I got this error: 我收到此错误:

java.lang.OutOfMemoryError: Java heap space

For 10,000 lines, the file size is about 7 MB, and ~70 MB for 100,000 lines. 对于10,000行,文件大小约为7 MB,对于100,000行,文件大小约为70 MB。 So, how would you solve this problem? 那么,您将如何解决这个问题? I know increasing the heap size is a work-around. 我知道增加堆大小是一种解决方法。 But are there any other solutions? 但是还有其他解决方案吗? Thank you in advance. 先感谢您。

def lines = []

In groovy, this creates an ArrayList<E> with size 0 and no preallocation of the internal Object[] . 在groovy中,这将创建一个ArrayList<E> ,其大小为0,并且没有内部Object[]预分配。

When adding items, if capacity is reached, a new ArrayList is created. 添加项目时,如果达到容量,则会创建一个新的ArrayList The larger the list, the more time spent reallocating a new list to accommodate new entries. 列表越大,重新分配新列表以容纳新条目所花费的时间就越多。 I suspect that's where your memory issue occurs because, although I'm not exactly sure how ArrayList allocates a new list, if you're getting OOM for a relatively small data set, that's where I'd look first. 我怀疑那是您的内存问题发生的原因,因为虽然我不确定ArrayList如何分配新列表,但是如果您要为相对较小的数据集获取OOM,那是我首先要看的地方。 For 100,000 entries, you create a new list roughly 29 times ( assuming expansion factor of 1.5 ) when starting with an empty ArrayList . 对于100,000个条目,从一个空的ArrayList开始,您将创建一个大约29倍的新列表( 假设扩展因子为1.5 )。

If you have a general idea how large the list needs to be, just set the initial capacity, doing so avoids all the reallocating nonsense; 如果您大致知道列表需要多大,只需设置初始容量即可,这样可以避免所有的重新分配废话; see if this works: 看看是否可行:

def lines = new ArrayList<String>(100000)

Assuming that you are likely trying to place the CSV file in a database you can do something like this. 假设您可能尝试将CS​​V文件放入数据库中,则可以执行以下操作。 The key groovy feature is splitEachLine(yourDelimiter) and using the fields array in the closure. 关键特性是splitEachLine(yourDelimiter)并在闭包中使用fields数组。

import groovy.sql.*

def sql = Sql.newInstance("jdbc:oracle:thin:@localhost:1521:ORCL",
    "scott", "tiger", "oracle.jdbc.driver.OracleDriver")

//define a variable that matches a table definition (jdbc dataset
def student = sql.dataSet("TEMP_DATA");
//now iterate over the csv file splitting each line on commas and load the into table.
new File("C:/temp/file.csv").splitEachLine(","){ fields ->
//insert each column we have into the temp table.
 student.add(
        STUDENT_ID:fields[0],
        FIRST_NAME:fields[1],
        LAST_NAME:fields[2]
    )
}
//yes the magic has happened the data is now in the staging table TEMP_DATA.
println "Number of Records  " + sql.firstRow("Select count(*) from TEMP_DATA")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM