简体   繁体   English

CSV Java文件读取和保存(在不同的ArrayList中)

[英]CSV Java file reading and saving (in different ArrayList)

Ok mates, here is my code. 好的队友,这是我的代码。 I've got a problem, because "records.csv" is a file which cointains moreless 20 millions line, each one made of 4 fields separated with a ','. 我遇到了一个问题,因为“ records.csv”是一个包含多达2000万行的文件,每个行由4个字段组成,并以','分隔。

As you can understand from the code, i'd like to have 4 Arraylists, each of them with all the values of a different field. 正如您从代码中可以理解的那样,我想拥有4个Arraylist,每个数组具有不同字段的所有值。 The method after a while stop working (i think because to 'add' an element to the list, java has a pointer that have to tread all the arraylist before). 一段时间后该方法停止工作(我认为是因为要将元素“添加”到列表中,所以Java具有一个指针,该指针必须先执行所有arraylist)。

I need to solve, but i don't know how. 我需要解决,但我不知道如何。

Suggestions? 有什么建议吗?

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.ArrayList;

    public class RecordReader {
    static ArrayList<String> id = new ArrayList <String> ();
    static ArrayList<String> field1 = new ArrayList <String> ();
    static ArrayList<String> field2 = new ArrayList <String> ();
    static ArrayList<String> field3 = new ArrayList <String> ();



    public static void Reader () {
        try {
        FileReader filein = new FileReader("Y:/datasets/records.csv");
        String token="";
        String flag = "id";
        int index=0, next;

        do {
            next = filein.read();

            if (next != -1) {

                if (next !=',' && next !='\n') 
                    token = token + next;

                else if (next == ','){
                    if (flag.compareTo("id")==0) {id.add (index, token); flag = "field1";}
                    else if (flag.compareTo("field1")==0) {field1.add (index, token); token=""; flag = "field2";}
                    else if (flag.compareTo("field2")==0) {field2.add (index, token); token=""; flag = "field3";}
                }

                else if (next == '\n') { 
                    if (flag.compareTo("field3")==0) {field3.add (index, token); token=""; flag = "id"; index++;} 
                }

                char nextc = (char) next; 
                System.out.print(nextc); 
                }
        } while (next!=-1);

        filein.close();
        }
        catch (IOException e) { System.out.println ("ERRORE, birichino!"); }
    }
}

I have to do it all in once, the file is 711000 bytes. 我必须一次完成所有操作,文件为711000字节。

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at java.nio.CharBuffer.wrap(Unknown Source) at sun.nio.cs.StreamEncoder.implWrite(Unknown Source) at sun.nio.cs.StreamEncoder.write(Unknown Source) at java.io.OutputStreamWriter.write(Unknown Source) at java.io.BufferedWriter.flushBuffer(Unknown Source) at java.io.PrintStream.write(Unknown Source) at java.io.PrintStream.print(Unknown Source) at RecordReader.Reader(RecordReader.java:42) at prova.main(prova.java:26)

I have a couple of suggestions for you. 我有两个建议给您。

First, you don't need to have 4 separate ArrayLists , just one will do fine. 首先,您不需要有4个单独的ArrayLists ,只需一个就可以了。 Instead of using filein.read() , I would wrap your FileReader with a BufferedReader and use it to read the file line by line and add each line to a single ArrayList . 而不是使用filein.read() ,我将用BufferedReader包装您的FileReader ,并使用它逐行读取文件,并将每一行添加到单个ArrayList

BufferedReader br = new BufferedReader(filein);
ArrayList<String> content = new ArrayList<String>();
String line = br.readLine();
while(line != null){
    //add lines to ArrayList
    content.add(line);
    line = br.readLine();
}

This will read the contents of the entire file into memory without the additional overhead of 3 extra ArrayLists . 这将把整个文件的内容读入内存,而不会产生3个额外的ArrayLists的额外开销。

Second, since your fields are separated by a , and (I'm assuming) always have the same number of fields, you can use the split() method to separate each line into an array of strings. 其次,由于您的字段由分隔,并且(我假设)字段的数量始终相同,因此您可以使用split()方法将每一行分成字符串数组。

String[] record = content.get(index).split(",");
//record[0] = id
//record[1] = field1
//record[2] = field2
//record[3] = field3

Put the above into a loop and you can iterate over all of the file's contents. 将以上内容放入循环中,即可遍历文件的所有内容。 Since you know how the information is ordered, retrieving the information you want is trivial to do. 由于您知道信息的排序方式,因此检索所需信息非常简单。

However, I will warn you that with a sufficiently large enough file (with multiple GB of data), eventually this approach will also fail. 但是,我警告您,如果文件足够大(具有多个GB的数据),此方法最终也会失败。

Can you try running the application with -Xmx option as shown below 您是否可以尝试使用-Xmx选项运行应用程序,如下所示

java -Xmx6g [javaclassfile] java -Xmx6g [javaclassfile]

I was able to resolve similar problem with this. 我能够解决类似的问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM