简体   繁体   中英

how to work with large csv file

I have a very huge csv file and I have to use some select query, getting avg,... I can not do that normally by reading line by line, because of out of memory.

the following code work well on a short csv file but not for huge one. I will appreciate if you can edit this code to use for large csv file.

import java.io.File;

import java.io.FileNotFoundException;
import java.util.Scanner;


public class Mu {
    public void Computemu()
    {
        String filename="testdata.csv";
        File file=new File(filename);
        try {
            Scanner inputstream=new Scanner(file);//Scanner read only string 
            // String data=inputstream.next();//Ignore the first line(header)
            double sum=0;
            double numberOfRating=0;

            while (inputstream.hasNext())
            {                       
               String data=inputstream.next();//get a whole line
                String[] values= data.split(";");//values separate by;
                double rating=Double.parseDouble(values[2].replaceAll("\"", ""));//change value to string
                if(rating>0)//do not consider implicit ratings
                {
                    sum+=rating;
                    numberOfRating++;
                }
            }
            inputstream.close();
            System.out.println("Mu is"+ (sum/numberOfRating));
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        }
    }
}

You didn't call useDelimiter so the next() methods must load the whole file into a string if it hasn't a white space (the default delimiter).

This leads to an OutOfMemory Error.

If you want to use a Scanner, set the delimiter according to your needs.

But a CSV library (like csvfile would probably be more efficient.

I suggest the use of Apache Commons FileUtil for this use case. This may not be what you are looking for in your question, but FileUtil usage is preferable to re-implementing it.

Specifically, please look at lineIterator method.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM