简体   繁体   中英

Dutch national flag - Deducing solution

After learning elementary sorts, selection sort , insertion sort & heap sort from sedgewick course in Coursera.

Selection sort - final sorted order starts forming every iteration

Insertion sort - Ascending order(not final) starts forming every iteration.

Heap sort - Aggressive version of insertion sort.


I have an assignment, in the above mentioned course, after learning sort algorithms(shown above),

Dutch national flag . Given an array of n buckets, each containing a red, white, or blue pebble, sort them by color. The allowed operations are:

swap(i,j) : swap the pebble in bucket i with the pebble in bucket j .

color(i) : color of pebble in bucket i .

The performance requirements are as follows:

At most n calls to color() .

At most n calls to swap() .

Constant extra space.


By reading this question,

From the performance requirements, given, I get a hint that n swaps are required(at-most). From this hint, I think about,

1) Insertion sort that ensures n swaps for n inversion pairs.

2) Am assuming colors that are stored in buckets, there is very great chance that, similar colors sit closer, given n buckets. This leads to a situation of partially sorted array, where an array is partially sorted, if the number of inversions <= cn .

Based on these above two conclusions,

Question:

1)

Is the thought process correct in deducing to insertion sort, as solution?

2)

What is the hash function to hash colors and insert colors in as partially sorted array? Do I need linear probing method?

3) Why do we need extra space, as given in question? Because Insertion sort is in-place algorithm

Note: To confirm my thought process, before writing code

1) Insertion sort that ensures n swaps for n inversion pairs.

2) Am assuming colors that are stored in buckets, there is very great chance that, similar colors sit closer, given n buckets. This leads to a situation of partially sorted array, where an array is partially sorted, if the number of inversions <= cn .

There are two problems with this line of reasoning.

The first is that the number of inversions is not O( n ) in the worst case. To see why, imagine that you have n / 2 blue pebbles followed by n / 2 red pebbles (and no white pebbles). Then the total number of inversions is ( n 2 ) / 4 . (And furthermore, judging from some quick experiments I've just run, even the average-case total number of inversions is ( n 2 ) / 6 — still not O( n ).)

The second problem with this line of reasoning is that you're asked to limit your number of swaps to exactly n , not O( n ). So even if insertion sort only required (say) 2n swaps, that would still be too many.


2) [¶] What is the hash function to hash colors and insert colors in as partially sorted array? Do I need linear probing method?

Insertion sort does not require a hash function; and I don't think you can build any meaningful sort of hash table within your constraint of constant additional space.

Rather, you can just represent the three colors as (say) 0, 1, and 2.

3) Why do we need extra space, as given in question? Because Insertion sort is in-place algorithm

Even insertion sort requires some extra variables, such as array-indices.


Alexander Anikin has already suggested the same solution that I was going to suggest, but I'm not sure if it's clear from his answer how you could come up with that on your own, so the rest of this answer will walk you through that.


So, suppose you start by trying insertion sort. We can then improve it incrementally until it satisfies the requirements. (This does require having a few insights, but each individual improvement requires only a small insight, rather than having to solve the whole problem at once.)

First improvement: # of calls to color(i) .

When we want to insert a new pebble into the already-sorted portion of the array, we need to find where it should go in that portion.

With pure insertion sort, we do that by examining pebbles in the already-sorted portion of the array. But we're only allowed to call color exactly n times; and we expect to have to call it at least once for each pebble — you can't sort a pebble without examining it — which means that to make this work, we need to call it exactly once for each pebble. We can't afford to re-call it for a pebble that we've already sorted.

To fix this, the key insight is that in the already-sorted part of the array, we don't actually need to examine a bucket in order to know what color pebble it has, provided we've kept track of how many pebbles we've already seen of each color. If we've previously seen r red pebbles, w white pebbles, and b blue pebbles, then the red pebbles are in buckets 0 through r-1 , the white ones are in r through r+w-1 , and the blue ones are in buckets r+w through r+w+b-1 . So we can figure out where to insert a new pebble without having to actually call color on any other pebbles.

Second improvement: # of calls to swap(i,j) , part 1.

The next problem is that with pure insertion sort, inserting an pebble means "shifting" all the pebbles after it, which potentially means a large number of swaps. (For example, if we've already seen 10 white pebbles and 5 blue pebbles, then inserting a red pebble means "shifting" 15 pebbles, which requires 15 swaps.)

To fix this, the key insight is that we don't actually care about the order of the pebbles within a color-group; the red pebbles can come in any order, followed by the white pebbles in any order, followed by the blue pebbles in any order. So, rather than shifting a large number of pebbles by just one position, we can shift a single pebble from the start of a color-group to the end of it.

For example, if we've previously seen r red pebbles, w white pebbles, and b blue pebbles, and we need to insert a new red pebble, then we can swap it into position r (giving us a white pebble (if w > 0 )), then swap the white pebble into position r+w (giving us a blue pebble (if b > 0 )).

This brings us down to O( n ) swaps in the worst case, and just under n swaps in the average case: a red pebble will require two swaps (if w > 0 and b > 0 ), a white pebble will require one swap (if b > 0 ), and a blue pebble will require zero swaps.

Third improvement: # of calls to swap(i,j) , part 2.

But this still leaves us with potentially too many swaps: if there are more red pebbles than blue ones, then we'll (usually) end up with more than n swaps.

To fix this, we need one last insight, which is that we know where blue pebbles will end up (since we don't care about the order they end up); so instead of putting blue pebbles immediately after the end of the already-seen white pebbles, we can put blue pebbles at the end of the array. That way we don't need to keep re-shifting the range of blue pebbles; once a blue pebble is placed, it can stay put.

So, for example, if we've seen r red pebbles, w white pebbles, and b blue pebbles, then the red pebbles should be in positions 0 through r-1 , the white pebbles should be in positions r through r+w-1 , and the blue pebbles should be in positions nb through n-1 . The positions from r+w through nb-1 are the as-yet-unsorted range.


So, our final algorithm looks like this:

  • initialize r , w , and b to 0 .
  • initialize i to 0 .
  • while i < n - b :
    • check color(i) :
      • if it's RED, then swap(i, r) and increment r and i .
      • if it's WHITE, then just increment w and i .
      • if it's BLUE, then swap(i, nb-1) and increment b . Do not increment i (because the pebble in position i is one that we just pulled from near the end of the array, that we haven't examined yet).

(Note that a few swaps can be eliminated by double-checking whether the two positions being swapped are actually different; but, that's not needed. We're under n swaps either way.)


Now, what if we had started with a different sort, rather than insertion sort? Some kinds of sorts would still take us in the right direction; for example, if we started with a bucket sort (or a sort of radix sort with only a single digit 0-2), we could still have made incremental fixes to end up at the above algorithm. Other kinds of sorts would not have worked out so well, but still might have led to useful insights that would have helped once we got on a better path. So it's important to try different things and keep an open mind. The more you explore a problem, the more insights you get, even if you occasionally go down the wrong path and need to back out and try something new.

To elaborate solution for this problem, you don't need mentioned algorithms.

Instead become familiar with partition routine implementations of Quick Sort - it would really helpful.

PS

Heap sort - Aggressive version of insertion sort

No, you are wrong, it is completely different approach

You can consider how Shell sort does exploit Insert sort internally

At first glance it looks like introduction to bucket sort: move every red pebble into Reds array, move every white pebble into Whites array, move every blue pebble into Blues array, and then copy these 3 arrays into final output array. Naive implementation will take 3* n calls of color() , 2* n calls of swap() and n +constant additional memory.

But restrictions are a bit off limits... We can not afford simple decisions, we can not afford additional memory for temporary arrays.

Note that n is the length of input array, not number of permutations and not multiplied by something. That is:

  • each call of swap() should place one pebble at its final place
  • swaps should be done in-place
  • each call of color() should lead to appropriate call of swap()

It's possible in case of only 3 colors. To make it possible we can divide input array to 3(4) sections:

r) red pebbles in the beginning of array

w) white pebbles follow red pebbles from the beginning

u) unprocessed pebbles follow white pebbles up to blue pebbles

b) blue pebbles in the end of array

Let's assume we use 0-based indexing of arrays. So, index a[0] is first element of array and a[n-1] is last element of array.

Now to keep track of array state we can use few variables:

  • variable r will hold number of processed red pebbles

  • variable w will hold number of processed white pebbles

  • variable b will hold number of processed blue pebbles

  • variable i will hold current index of pebble to process(processed)

This can be invariant for processing loop. At start r=0, w=0, b=0, i=0. At end r, w, b hold number of corresponding pebbles and i=nb.

So, array will look like this in the beginning:

uuuuuuuuuuuu
^           ^
r           n=n-0=n-b

Array will look like this during processing:

rrwwwuuuuuuub
  ^  ^      ^
  r  w+r    n-b 
     i

Arrya will look like this in the end:

rrrwwwwwwbbbb
^  ^     ^   ^
0  r     n-b n
         w+r
         i

Each step of processing loop (i is incremented after this) we make decision and do one of 3 possible things:

  • if a[i] is red, we make swap(r, i) and increase r
  • if a[i] is white, we increase w - it's useless action, actually, as we don't really use w, but I think it's better do this for execise
  • if a[i] is blue, we increase b and make swap(i, nb) - we can skip swaps if i=nb, but generally algorithm should work without this skip

Loop finishes once i>=nb

Now, for your questions:

Is the thought process correct in deducing to insertion sort, as solution?

Nope, insertion sort will do more than n calls of color() and more than n calls of swap()

What is the hash function to hash colors and insert colors in as partially sorted array?

You did not specify order of colors, so I've decided it myself. But you don't need hash for pebbles, since it will only increase number of swaps if you put pebbles into temporary data structure

Do I need linear probing method?

It's tricky part - one call of color() per input pebble (no more than 1). You can store color() result in local variable and then use local variable in, say, case operator. Strictly said, it is wrong - you still do more comparisions of color than needed, but probably your teacher will accept this approach, since you still call color() only that number of times. Hash is not needed, really.

Still, you can try array or member function references if your programming language allows it, indexed by pebble colors. Each function should do appropriate steps for processing loop decision as described above.

Since you did not specify PL and how colors are represented, it's hard to suggest how this can be done effectively.

Why do we need extra space, as given in question? Because Insertion sort is in-place algorithm

Even insertion sort uses some extra space for local variables.

PS I do not write ready-to-use code as it's your execise. Feel free to update answer with your code if something goes wrong when you implement it in code.

I don't think you should (or need to) assume that closer buckets have higher chance of having the same color. The problem can be solved maintaining indices of the first occurrence of two out of the three colors (the ones not considered "first" in the ordering). Replacing colors with values 0, 1, 2, I believe this code satisfies the requirements:

public class Z {
  private static int first1, first2;

  private static void swap(int[] v, int i, int j) {
    int tmp = v[i];
    v[i] = v[j];
    v[j] = tmp;
  }

  private static void sort(int[] v) {
    first1 = -1;
    first2 = -1;
    for(int i=0; i<v.length; i++) {
      switch(v[i]) {
        case 0:
          if (first1 != -1) {
            swap(v, i, first1);
            if (first2 != -1) {
              swap(v, i, first2);
              first2++;
            }
            first1++;
          } else if (first2 != -1) {
            swap(v, i, first2);
            first2++;
          }
          break;
        case 1:
          if (first2 != -1) {
            swap(v, i, first2);
            if (first1 == -1) {
              first1 = first2;
            }
            first2++;
          } else if (first1 == -1) {
            first1 = i;
          }
          break;
        case 2:
          if (first2 == -1) {
            first2 = i;
          }
          break;
      }
    }
  }
  public static String toString(int[] v) {
    StringBuilder sb = new StringBuilder();
    sb.append("[");
    for(int i=0; i<v.length; i++) {
      sb.append(v[i]);
      sb.append(" ");
    }
    sb.append("]");
    return sb.toString();
  }

  public static void main(String[] args) {
    int c = Integer.parseInt(args[0]);
    int[] v = new int[c];
    while(true) {
      int[] tmp = v.clone();
      System.out.print(toString(tmp) + "=>");
      sort(tmp);
      System.out.println(toString(tmp));
      int prev = -1;
      for(int j=0; j<tmp.length; j++) {
        if (tmp[j] < prev) {
          throw new Error();
        }
      }
      int j;
      for(j=0; j<c; j++) {
        v[j]++;
        if (v[j] == 3) {
          v[j] = 0;
        } else {
          break;
        }
      }
      if (j == c) {
        break;
      }
   }
  }
}

As a hint, Wikipedia says the following, emphasis mine:

The Dutch national flag problem is a computer science programming problem proposed by Edsger Dijkstra. The flag of the Netherlands consists of three colors: red, white and blue. Given balls of these three colors arranged randomly in a line (the actual number of balls does not matter), the task is to arrange them such that all balls of the same color are together and their collective color groups are in the correct order.

The solution to this problem is of interest for designing sorting algorithms; in particular, variants of the quicksort algorithm that must be robust to repeated elements need a three-way partitioning function that groups items less than a given key (red), equal to the key (white) and greater than the key (blue). Several solutions exist that have varying performance characteristics, tailored to sorting arrays with either small or large numbers of repeated elements.

(For the solution, the article also has pseudocode for the algorithm)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM