简体   繁体   中英

How to read efficiently a huge text file in java and split its content to sort it?

I have the following structure of a textfile (around 360741 KB):

123123123123,123123123,1,123123,123123,NAME1,LASTNAME1,LASTNAME2

Since I need to sort the file by name, I'm trying to to place it in a LinkedList to make it easier for me to sort it by an algorithm like Merge-Sort or Quicksort.

The issue I have is that it takes too long to Split every line and place it in a LinkedList.

Could you guys suggest me alternative for doing this in a more time-efficient way?

What I'm doing:

   try {
        BufferedReader in = new BufferedReader(new FileReader("C:\\Users\\MyDirectory\\File.txt"));
        String str;
        LinkedList<Persona> li = new LinkedList();
        while ((str = in.readLine()) != null) {
            //System.out.println(str);
            String[] array = str.split(",");

             //Take the values from the array to create an instance of the class and place it in the LinkedList.
            li.add(new Persona(array[0],array[1],array[2],array[3],array[4],array[5],array[6],array[7]));
            //System.out.println(str);
        }
        System.out.println("fin");
        in.close();
    } catch (IOException e) {
        System.out.println("File Read Error");
    }

}

LinkedList is not particularly efficient in terms of memory and no good for the built in sort algorythims. I suggest you load each line into an Array List and split it only by the name (not each field as you don't need to break those up)

You can sort an ArrayList with Collections.sort and a custom comparator.

Note: you can expect 352 MB of text to use at least 1 GB of memory and I would suggest giving it 2-4 GB to improve performance.

This would take 5 different steps:

1) Split the file into sections (of course you know this, maybe 10 MB chunks). Something manageable by your seemingly small ram capacity.

2) Sort each chunk respectively. Save to its own file (this is for easy management)

3) Merge each of the sorted lists into seperate files by AZ letters as files names (or however you want depending on how many A's compare to Z's ie A1.txt A2.txt A3.txt ...etc)

4) Sort the Merged files by groups in separate larger files. (All the A's then the B's ... etc)

5) Merge the files into one large file (if you wish) Order accordingly

Note: This is also known as External Sort . And you shouldn't be using linkedlists. Try something like vectors or some already built sorting functions.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM