简体   繁体   中英

Re-sort a vector after a small number of elements have been modified

If we have a vector of size N that was previously sorted, and replace up to M elements with arbitrary values (where M is much smaller than N ), is there an easy way to re-sort them at lower cost (ie generate a sorting network of reduced depth) than a full sort?

For example if N =10 and M =2 the input might be

10 20 30 40 999 60 70 80 90 -1

Note: the indices of the modified elements are not known (until we compare them with the surrounding elements.)


Here is an example where I know the solution because the input size is small and I was able to find it with a brute-force search:

if N = 5 and M is 1, these would be valid inputs:

0 0 0 0 0     0 0 1 0 0     0 1 0 0 0     0 1 1 1 0     1 0 0 1 1     1 1 1 1 0

0 0 0 0 1     0 0 1 0 1     0 1 0 0 1     0 1 1 1 1     1 0 1 1 1     1 1 1 1 1

0 0 0 1 0     0 0 1 1 0     0 1 0 1 1     1 0 0 0 0     1 1 0 1 1

0 0 0 1 1     0 0 1 1 1     0 1 1 0 1     1 0 0 0 1     1 1 1 0 1

For example the input may be 0 1 1 0 1 if the previously sorted vector was 0 1 1 1 1 and the 4th element was modified, but there is no way to form 0 1 0 1 0 as a valid input, because it differs in at least 2 elements from any sorted vector.

This would be a valid sorting network for re-sorting these inputs:

>--*---*-----*-------->
   |   |     | 
>--*---|-----|-*---*-->
       |     | |   |
>--*---|-*---*-|---*-->
   |   | |     |
>--*---*-|-----*---*-->
         |         |
>--------*---------*-->

We do not care that this network fails to sort some invalid inputs (eg 0 1 0 1 0 .)

And this network has depth 4, a saving of 1 compared with the general case ( a depth of 5 generally necessary to sort a 5-element vector .)

Unfortunately the brute-force approach is not feasible for larger input sizes.

Is there a known method for constructing a network to re-sort a larger vector?

My N values will be in the order of a few hundred, with M not much more than √ N .

Ok, I'm posting this as an answer since the comment restriction on length drives me nuts :)

You should try this out:

  • implement a simple sequential sort working on local memory (insertion sort or sth. similar). If you don't know how - I can help with that.
  • have only a single work-item perform the sorting on the chunk of N elements
  • calculate the maximum size of local memory per work-group (call clGetDeviceInfo with CL_DEVICE_LOCAL_MEM_SIZE ) and derive the maximum number of work-items per work-group, because with this approach your number of work-items will most likely be limited by the amount of local memory.

This will probably work rather well I suspect, because:

  • a simple sort may be perfectly fine, especially since the array is already sorted to a large degree
  • parallelizing for such a small number of items is not worth the trouble (using local memory however is!)
  • since you're processing billions of such small arrays, you will achieve a great occupancy even if only single work-items process such arrays

Let me know if you have problems with my ideas.

EDIT 1:

I just realized I used a technique that may be confusing to others: My proposal for using local memory is not for synchronization or using multiple work items for a single input vector/array. I simply use it to get a low read/write memory latency. Since we use rather large chunks of memory I fear that using private memory may cause swapping to slow global memory without us realizing it. This also means you have to allocate local memory for each work-item . Each work-item will access its own chunk of local memory and use it for sorting (exclusively). I'm not sure how good this idea is, but I've read that using too much private memory may cause swapping to global memory and the only way to notice is by looking at the performance (not sure if I'm right about this).

Here is an algorithm which should yield very good sorting networks. Probably not the absolute best network for all input sizes, but hopefully good enough for practical purposes.

  1. store (or have available) pre-computed networks for n < 16
  2. sort the largest 2^k elements with an optimal network. eg: bitonic sort for largest power of 2 less than or equal to n.
  3. for the remaining elements, repeat #2 until m < 16, where m is the number of unsorted elements
  4. use a known optimal network from #1 to sort any remaining elements
  5. merge sort the smallest and second-smallest sub-lists using a merge sorting network
  6. repeat #5 until only one sorted list remains

All of these steps can be done artificially, and the comparisons stored into a master network instead of acting on the data.

It is worth pointing out that the (bitonic) networks from #2 can be run in parallel, and the smaller ones will finish first. This is good, because as they finish, the networks from #5-6 can begin to execute.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM