简体   繁体   中英

Efficient Java Collection to analyse the inputs from CSV file with millions of records

Lets say I have a csv file with stock exchange information in following format: timestamp, name, price, qty, account, buy/sell. This file may have millions of records and represents the trading activity for the day. The file is not sorted and I need to choose the most optimal Java collection for holding this data in order to provide analytics efficiently.

Analytics Eg: 1)Most sold stock 2) Account with max transactions 3) Highest quantity of stock bought in a time range. 4) Top K people with highest transactions.

Basically I will need to sort this list many times based on different fields.

So after a little bit of search I found that a Tree based collection is best for this use case. Like a TreeMap. Is there any other collection which would be better?

TreeSet will be efficient if you want sorted by one parameter. You can

  1. Create a class like:
    public class Record {
        Calendar timeStamp;
        String name;
        double price;
        //...
    }
  1. Create comparators for each task

  2. Create a LinkedList (or other Collection)

    List <Record> records=new LinkedList();
  1. Use your comparators
    records.sort(yourComparator1);
    records.sort(yourComparator2);
    records.sort(yourComparator3);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM