[英]OutOfMemoryError from reading a text file
I need to parse a csv file at work. 我需要在工作中解析一个csv文件。 Each line in the file is not very long, only a few hundred characters. 文件中的每一行不是很长,只有几百个字符。 I used the following code to read the file into memory. 我使用以下代码将文件读入内存。
def lines = []
new File( fileName ).eachLine { line -> lines.add( line ) }
When the number of lines is 10,000, the code works just fine. 当行数为10,000时,代码可以正常工作。 However, when I increase the number of lines to 100,000. 但是,当我将行数增加到100,000时。 I got this error: 我收到此错误:
java.lang.OutOfMemoryError: Java heap space
For 10,000 lines, the file size is about 7 MB, and ~70 MB for 100,000 lines. 对于10,000行,文件大小约为7 MB,对于100,000行,文件大小约为70 MB。 So, how would you solve this problem? 那么,您将如何解决这个问题? I know increasing the heap size is a work-around. 我知道增加堆大小是一种解决方法。 But are there any other solutions? 但是还有其他解决方案吗? Thank you in advance. 先感谢您。
def lines = []
In groovy, this creates an ArrayList<E>
with size 0 and no preallocation of the internal Object[]
. 在groovy中,这将创建一个ArrayList<E>
,其大小为0,并且没有内部Object[]
预分配。
When adding items, if capacity is reached, a new ArrayList
is created. 添加项目时,如果达到容量,则会创建一个新的ArrayList
。 The larger the list, the more time spent reallocating a new list to accommodate new entries. 列表越大,重新分配新列表以容纳新条目所花费的时间就越多。 I suspect that's where your memory issue occurs because, although I'm not exactly sure how ArrayList allocates a new list, if you're getting OOM for a relatively small data set, that's where I'd look first. 我怀疑那是您的内存问题发生的原因,因为虽然我不确定ArrayList如何分配新列表,但是如果您要为相对较小的数据集获取OOM,那是我首先要看的地方。 For 100,000 entries, you create a new list roughly 29 times ( assuming expansion factor of 1.5 ) when starting with an empty ArrayList
. 对于100,000个条目,从一个空的ArrayList
开始,您将创建一个大约29倍的新列表( 假设扩展因子为1.5 )。
If you have a general idea how large the list needs to be, just set the initial capacity, doing so avoids all the reallocating nonsense; 如果您大致知道列表需要多大,只需设置初始容量即可,这样可以避免所有的重新分配废话; see if this works: 看看是否可行:
def lines = new ArrayList<String>(100000)
Assuming that you are likely trying to place the CSV file in a database you can do something like this. 假设您可能尝试将CSV文件放入数据库中,则可以执行以下操作。 The key groovy feature is splitEachLine(yourDelimiter) and using the fields array in the closure. 关键特性是splitEachLine(yourDelimiter)并在闭包中使用fields数组。
import groovy.sql.*
def sql = Sql.newInstance("jdbc:oracle:thin:@localhost:1521:ORCL",
"scott", "tiger", "oracle.jdbc.driver.OracleDriver")
//define a variable that matches a table definition (jdbc dataset
def student = sql.dataSet("TEMP_DATA");
//now iterate over the csv file splitting each line on commas and load the into table.
new File("C:/temp/file.csv").splitEachLine(","){ fields ->
//insert each column we have into the temp table.
student.add(
STUDENT_ID:fields[0],
FIRST_NAME:fields[1],
LAST_NAME:fields[2]
)
}
//yes the magic has happened the data is now in the staging table TEMP_DATA.
println "Number of Records " + sql.firstRow("Select count(*) from TEMP_DATA")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.