简体   繁体   中英

Java - Sorting and csv: good practice with huge data

I need to order a huge csv file (10+ million records) with several algorithms in Java but I've some problem with memory amount.

Basically I have a huge csv file where every record has 4 fields, with different type (String, int, double). I need to load this csv into some structure and then sort it by all fields.

What was my idea: write a Record class (with its own fields), start read csv file line by line, make a new Record object for every line and then put them into an ArrayList. Then call my sorter algorithms for each field.

It doesn't work.. I got and OutOfMemoryException when I try lo load all Record object into my ArrayList.

In this way I create tons of object and I think that is not a good idea. What should I do when I have this huge amount of data? Which method/data structure can ben less expensive in terms of memory usage?

My point is just to use sort algs and look how they work with big set of data, it's not important save the result of sorting into a file.

I know that there are some libs for csv, but I should implements it without external libs.

Thank you very much! :D

Cut your file into pieces (depending on the size of the file) and look into merge sort. That way you can sort even big files without using a lot of memory, and it's what databases use when they have to do huge sorts.

I would use an in memory database such as h2 in in-memory-mode ( jdbc:h2:mem: ) so everything stays in ram and isn't flushed to disc (provided you have enough ram, if not you might want to use the file based url). Create your table in there and write every row from the csv. Provided you set up the indexes properly sorting and grouping will be a breeze with standard sql

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM