简体   繁体   English

选择正确的分类集合

[英]Choosing the right sorted collection

I am a bit in doubt on which collection to use for our data. 对于将哪个集合用于我们的数据,我有些怀疑。

The domain is this (example): 域是这个(示例):

For each supermarket we add a new item to a collection with a timestamp and total amount each time any customer pays at the register. 对于每个超级市场,我们都会向集合中添加新商品,并带有时间戳和每次客户在收银机上付款时的总金额。

We currently do this: 我们目前正在这样做:

We have a Dictionary collection with key = UniqueSupermarketID and value is a List<{timestamp, amount}> 我们有一个Dictionary集合,其键为= UniqueSupermarketID,其值为List <{timestamp,amount}>

Each time a customer pays we simply add a new item to the collection for the specific supermarket. 每次客户付款时,我们只需将新商品添加到特定超市的商品中即可。

We need to extract data from this dictionary in a way that: 我们需要以以下方式从此字典中提取数据:

For a specified supermarket, get the newest cash register object with timestamp equaling "some timestamp" 对于指定的超级市场,获取最新的收银机对象,其时间戳等于“ some timestamp”

We currently do this as: 目前,我们这样做是:

supermarketDictionary["supermarket_01"]
    .OrderByDescending(i => t.TimeStamp)
    .FirstOrDefault(i => i.TimeStamp == 'some timestamp')

This obviously quickly starts performing like crap - so I am trying to figure out which collection to store data in instead. 显然,这很快就开始表现得像废话一样-因此,我试图找出将数据存储在哪个集合中。

I am considering using a normal dictionary to hold the "supermarket id <-> cash register list" relation and using a SortedDictionary for the timestamp/amounts used as keys . 我正在考虑使用普通的词典来保存“超市ID <->收银机列表”关系,并使用SortedDictionary作为用作的时间戳/金额。

Is this the correct approach? 这是正确的方法吗? I would of course need to implement IComparable correctly on the timestamp to get it to work right. 我当然需要在时间戳上正确实现IComparable,以使其正常工作。

Update 2014-01-03: 2014年1月3日更新:

There are currently about 7 million rows in the list in question. 当前,该列表中大约有700万行。 The usages of the lists in our system have been identified as these: 我们系统中列表的用法已确定为:

_states
    .OrderBy(x => x.TimeStamp)
    .FirstOrDefault(x => x.WtgId == wtgId && x.IsAvailable && x.TimeStamp >= timeStamp);

_states
    .Where(x => x.WtgId == wtgId && x.IsAvailable && x.TimeStamp >= timeStamp && x.TimeStamp <= endDateTime)
    .OrderBy(x => x.TimeStamp).ToList();

_states.Remove(state);

if (!_states.Contains(message))
    _states.Add(message);

Thanks, 谢谢,

/Jesper Copenhagen, Denmark /丹麦丹麦哥本哈根

EDIT: based on the update 编辑:基于更新

All right, seeing what you really need sure helps to make a right decision. 好吧,看到您真正需要的东西有助于做出正确的决定。 If your data comes already in order there is no need for a sorted collection and your four usages can be reduced to one -> 如果您的数据已经整理好了,那么就不需要分类收集了,您的四种用法可以减少为一种->

Searching for one item that matches some criteria 搜索符合某些条件的一项

  • adding with an existence check - adding is a cheap operation in non-sorted collections and existence check is just a searching for one item 使用存在检查进行添加-在未排序的集合中添加是一种廉价的操作,存在检查仅是搜索一项
  • removing by item is also at the most one passing through a collection plus the remove operation itself which is also quite cheap (not in an array if done many times, though) 按项目删除也是通过集合进行的最多一次操作,加上删除操作本身也很便宜(但是如果进行多次,则不在数组中)

Try using PLINQ and carefully measure how it performs against LINQ. 尝试使用PLINQ,并仔细评估其对LINQ的性能。 With so many entries, the difference should be nice. 有这么多的条目,区别应该很好。

_states.AsParallel().FirstOrDefault(...);

It will just create a few threads on the background and each of them will search some part of the collection and at the end results are merged. 它只会在后台创建几个线程,每个线程都将搜索集合的一部分,最后将结果合并。 The .NET framework should choose the best number of threads for you, but if you feel like trying, apped .WithDegreeOfParallelism(x) where x is a number of threads it will use. .NET框架应为您选择最佳的线程数,但是如果您想尝试,请在.WithDegreeOfParallelism(x).WithDegreeOfParallelism(x) ,其中x是它将使用的线程数。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM