[英]Efficient Java Collection to analyse the inputs from CSV file with millions of records
Lets say I have a csv file with stock exchange information in following format: timestamp, name, price, qty, account, buy/sell.假设我有一个 csv 文件,其中包含以下格式的证券交易所信息:时间戳、名称、价格、数量、帐户、买入/卖出。 This file may have millions of records and represents the trading activity for the day.
该文件可能有数百万条记录,代表当天的交易活动。 The file is not sorted and I need to choose the most optimal Java collection for holding this data in order to provide analytics efficiently.
该文件未排序,我需要选择最佳的 Java 集合来保存此数据,以便有效地提供分析。
Analytics Eg: 1)Most sold stock 2) Account with max transactions 3) Highest quantity of stock bought in a time range.分析 例如:1) 卖出最多的股票 2) 交易最多的账户 3) 在一个时间范围内买入的最高数量的股票。 4) Top K people with highest transactions.
4)交易量最高的前K人。
Basically I will need to sort this list many times based on different fields.基本上,我需要根据不同的字段对这个列表进行多次排序。
So after a little bit of search I found that a Tree based collection is best for this use case.所以经过一番搜索后,我发现基于树的集合最适合这个用例。 Like a TreeMap.
就像一个树图。 Is there any other collection which would be better?
有没有其他更好的收藏?
TreeSet will be efficient if you want sorted by one parameter.如果您想按一个参数排序,TreeSet 将是有效的。 You can
你可以
public class Record {
Calendar timeStamp;
String name;
double price;
//...
}
Create comparators for each task为每个任务创建比较器
Create a LinkedList (or other Collection)创建一个 LinkedList(或其他集合)
List <Record> records=new LinkedList();
records.sort(yourComparator1);
records.sort(yourComparator2);
records.sort(yourComparator3);
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.