简体   繁体   中英

Process large file in java

How to parse large file like 1.2GB where total lines in file is 36259190 . How to parse each line to an object and save it in a list.

I get each time an OutOfMemmoryError .

List<Point> points = new ArrayList<>();

public void m2() throws IOException {
    try (BufferedReader reader = Files.newBufferedReader(Paths.get(DATAFILE))) {
        reader.lines().map(s -> s.split(","))
        .skip(0)
        .forEach(p -> points.add(newPoint(p[0], p[1], p[2])));
    }
}


class Point {
    String X;
    String Y;
    String Z;
}

Care for your data types. I'm quite sure that your points do not consist of three text fragments. So define the fields of Point according to the actual type, eg using int or double . These primitive data types consume significantly less memory than their String representation.

class Point {
    double x, y, z;
    Point(double x, double y, double z) {
        this.x = x;
        this.y = y;
        this.z = z;
    }
    Point(String x, String y, String z) {
        this.x = Double.parseDouble(x);
        this.y = Double.parseDouble(y);
        this.z = Double.parseDouble(z);
    }
}

Then collect your data file as

public List<Point> m2() throws IOException {
    try(BufferedReader reader = Files.newBufferedReader(Paths.get(DATAFILE))) {
        return reader.lines().map(s -> s.split(","))
            .map(a -> new Point(a[0], a[1], a[2]))
            .collect(Collectors.toList());
    }
}

Then, as noted by others, care for the memory allocated for your JVM. Using the point class above, you can handle 36 Mio instances using a heap of ~1½ GiB without problems…

You need to use command line arguments -Xms (min memory) -Xmx (max memory).

Examples:

-Xmx4G (4GB)
-Xmx200M (200MB)
java -jar program.jar -Xmx8G

The Answer by Shiro is correct, allocate more memory to Java.

Database

If you cannot afford the memory, then use a database. For example, Postgres or H2.

One of the purposes for a database is to persist data to storage while efficiently handling memory for queries and for loading data as needed.

As you read each line of the data file, store immediately in the database. Later query for needed records. Instantiate objects in memory only for the needed rows from that query's result set.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM