简体   繁体   中英

.net collection for fast filtering (sorted collection)

When profiling a very slow method, I discovered the lag is on searching and filtering of a collection.

The method does the following (in order). According to profiler, 80% of time are spend on step 1-3.

  1. Read sorted collection from a file and deserialize using Protobuf-net (v2)
  2. From a sorted collection, filter based on a start and end integer (name .RangeFromTo() )
  3. From the same sorted collection, get the next element of the collection (name . Right() )
  4. Perform some task...

.RangeFromTo() filters for a given range, for example:

[3,7,9,12].RangeFromTo(2,9) -> [3,7,9]
[3,7,9,12].RangeFromTo(2,8) -> [3,7]
[3,7,9,12].RangeFromTo(7,13) -> [7,9,12]
[3,7,9,12].RangeFromTo(13,14) -> [ ]

.Right() finds an element in the collection and gives you the next on in the list. If the element doesn't exist it gives you the closest one counting to the right. For example:

[3,7,9,12].Right(0) -> 3
[3,7,9,12].Right(3) -> 7
[3,7,9,12].Right(4) -> 7
[3,7,9,12].Right(12) -> null

Currently the collection is using SortedArray from C5 ( https://github.com/sestoft/C5/ ). Is there a more suitable collection that I can use?

Note: Step 1. takes around 30% of the total time. If I use a List instead, protobuf actually takes 40% less time deserializing! I guess when inserting into an SortedArray the collection doesn't know the data is already sorted and is doing a whole bunch of work. The ideal collection (if exist) should also be able to bypass that.

Edit: To clarify, the list are around 1000-5000 and there are 90k different collections! The method in question needs to load all the collections in memory to perform some business task.

Edit 2: I have added some sample benchmark here:

https://github.com/cchanpromys/so_19188345

It compares SortedArray from C5 with SortedSet from .Net. So far the results are as follows:

C5 sorted array deserialize took 904
Sorted set deserialize took 1040
C5 sorted array .Right() took 5
Sorted set .Right() took 798    <--- Not sure what happened here...
C5 sorted array .RangeFromTo() took 217
Sorted set .RangeFromTo() took 140

Edit 3 This is out of my expectations but I ended up with a custom implementation of a list.

The problem that I had is that SortedArray's Find operation (in general) takes O(Log(N)), while I want it to be an O(1) operation.

Also, the list is sorted by nature, you will never add to the middle of the list.

So I ended up implementing a list that has an internal indexer array, for example:

For example:

indexer: [0,0,0,0,1,1,1,1,2,2]
list: [3,7,9]

So .Right(3) would be list[indexer[3]++] .

The code can be found here .

It is hard to believe that this type of list is not already implemented somewhere on the internet. If possible, I would like to use a library so I won't have to manage my own list.

Do such implementation exist on the internet?

如果您的阵列足够小(可能低于10-20个元素),那么简单的线性搜索很有可能足够好(在某些情况下, List会在测量中更快地显示)并且您可以使用Where / TakeWhile缩小范围:

  (new[]{3,7,9,12}).Where(i => i>= 2).TakeWhile(i => i <= 9)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM