简体   繁体   English

用于快速过滤的.net集合(已排序集合)

[英].net collection for fast filtering (sorted collection)

When profiling a very slow method, I discovered the lag is on searching and filtering of a collection. 在分析一个非常慢的方法时,我发现滞后是搜索和过滤集合。

The method does the following (in order). 该方法执行以下操作(按顺序)。 According to profiler, 80% of time are spend on step 1-3. 根据剖析器,80%的时间花在步骤1-3上。

  1. Read sorted collection from a file and deserialize using Protobuf-net (v2) 从文件中读取已排序的集合并使用Protobuf-net(v2)进行反序列化
  2. From a sorted collection, filter based on a start and end integer (name .RangeFromTo() ) 从排序的集合中,基于开始和结束整数进行过滤(名称.RangeFromTo()
  3. From the same sorted collection, get the next element of the collection (name . Right() ) 从相同的排序集合中,获取集合的下一个元素(名称。 Right()
  4. Perform some task... 执行一些任务......

.RangeFromTo() filters for a given range, for example: .RangeFromTo()过滤给定范围,例如:

[3,7,9,12].RangeFromTo(2,9) -> [3,7,9]
[3,7,9,12].RangeFromTo(2,8) -> [3,7]
[3,7,9,12].RangeFromTo(7,13) -> [7,9,12]
[3,7,9,12].RangeFromTo(13,14) -> [ ]

.Right() finds an element in the collection and gives you the next on in the list. .Right()在集合中找到一个元素,并在列表中为您提供下一个元素。 If the element doesn't exist it gives you the closest one counting to the right. 如果元素不存在,它会为您提供最接近右边的元素。 For example: 例如:

[3,7,9,12].Right(0) -> 3
[3,7,9,12].Right(3) -> 7
[3,7,9,12].Right(4) -> 7
[3,7,9,12].Right(12) -> null

Currently the collection is using SortedArray from C5 ( https://github.com/sestoft/C5/ ). 目前该集合使用C5的SortedArrayhttps://github.com/sestoft/C5/ )。 Is there a more suitable collection that I can use? 我可以使用更合适的系列吗?

Note: Step 1. takes around 30% of the total time. 注意:步骤1.大约占总时间的30%。 If I use a List instead, protobuf actually takes 40% less time deserializing! 如果我改用List,那么protobuf实际上减少了40%的反序列化时间! I guess when inserting into an SortedArray the collection doesn't know the data is already sorted and is doing a whole bunch of work. 我想当插入到SortedArray时,集合不知道数据已经排序并且正在进行大量的工作。 The ideal collection (if exist) should also be able to bypass that. 理想的集合(如果存在)也应该能够绕过它。

Edit: To clarify, the list are around 1000-5000 and there are 90k different collections! 编辑:澄清一下,列表大约1000-5000,有90k个不同的集合! The method in question needs to load all the collections in memory to perform some business task. 有问题的方法需要在内存中加载所有集合以执行某些业务任务。

Edit 2: I have added some sample benchmark here: 编辑2:我在这里添加了一些示例基准:

https://github.com/cchanpromys/so_19188345 https://github.com/cchanpromys/so_19188345

It compares SortedArray from C5 with SortedSet from .Net. 它将C5的SortedArray与.Net的SortedSet进行比较。 So far the results are as follows: 到目前为止,结果如下:

C5 sorted array deserialize took 904
Sorted set deserialize took 1040
C5 sorted array .Right() took 5
Sorted set .Right() took 798    <--- Not sure what happened here...
C5 sorted array .RangeFromTo() took 217
Sorted set .RangeFromTo() took 140

Edit 3 This is out of my expectations but I ended up with a custom implementation of a list. 编辑3这超出了我的预期,但我最终得到了一个列表的自定义实现。

The problem that I had is that SortedArray's Find operation (in general) takes O(Log(N)), while I want it to be an O(1) operation. 我遇到的问题是SortedArray的Find操作(通常)需要O(Log(N)),而我希望它是O(1)操作。

Also, the list is sorted by nature, you will never add to the middle of the list. 此外,列表按性质排序,您永远不会添加到列表的中间。

So I ended up implementing a list that has an internal indexer array, for example: 所以我最终实现了一个具有内部索引器数组的列表,例如:

For example: 例如:

indexer: [0,0,0,0,1,1,1,1,2,2]
list: [3,7,9]

So .Right(3) would be list[indexer[3]++] . 所以.Right(3)将是list[indexer[3]++]

The code can be found here . 代码可以在这里找到。

It is hard to believe that this type of list is not already implemented somewhere on the internet. 很难相信这种类型的列表还没有在互联网上的某个地方实现。 If possible, I would like to use a library so I won't have to manage my own list. 如果可能的话,我想使用一个库,所以我不必管理我自己的列表。

Do such implementation exist on the internet? 互联网上是否存在此类实施?

如果您的阵列足够小(可能低于10-20个元素),那么简单的线性搜索很有可能足够好(在某些情况下, List会在测量中更快地显示)并且您可以使用Where / TakeWhile缩小范围:

  (new[]{3,7,9,12}).Where(i => i>= 2).TakeWhile(i => i <= 9)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM