简体   繁体   中英

Lexicographic minimum permutation such that all adjacent letters are distinct

It's a bonus school task for which we didn't receive any teaching yet and I'm not looking for a complete code, but some tips to get going would be pretty cool. Going to post what I've done so far in Java when I get home, but here's something I've done already.

So, we have to do a sorting algorithm, which for example sorts "AAABBB" to the ABABAB. Max input size is 10^6, and it all has to happen under 1 second. If there's more than one answer, the first one in alphabetical order is the right one. I started to test different algorithms to even sort them without that alphabetical order requirement in mind, just to see how the things work out.

First version:

Save the ascii codes to the Integer array where index is the ascii code, and the value is amount which that character occurs in the char array. Then I picked 2 highest numbers, and started to spam them to the new character array after each other, until some number was higher, and I swapped to it. It worked well, but of course the order wasn't right.

Second version:

Followed the same idea, but stopped picking the most occurring number and just picked the indexes in the order they were in my array. Works well until the input is something like CBAYYY. Algorithm sorts it to the ABCYYY instead of AYBYCY. Of course I could try to find some free spots for those Y's, but at that point it starts to take too long.

An interesting problem, with an interesting tweak. Yes, this is a permutation or rearranging rather than a sort. No, the quoted question is not a duplicate.

Algorithm.

  1. Count the character frequencies.
  2. Output alternating characters from the two lowest in alphabetical order.
  3. As each is exhausted, move to the next.
  4. At some point the highest frequency char will be exactly half the remaining chars. At that point switch to outputting all of that char alternating in turn with the other remaining chars in alphabetical order.

Some care required to avoid off-by-one errors (odd vs even number of input characters). Otherwise, just writing the code and getting it to work right is the challenge.


Note that there is one special case, where the number of characters is odd and the frequency of one character starts at (half plus 1). In this case you need to start with step 4 in the algorithm, outputting all one character alternating with each of the others in turn.

Note also that if one character comprises more than half the input then apart for this special case, no solution is possible. This situation may be detected in advance by inspecting the frequencies, or during execution when the tail consists of all one character. Detecting this case was not part of the spec.


Since no sort is required the complexity is O(n). Each character is examined twice: once when it is counted and once when it is added to the output. Everything else is amortised.

You start by count each number of letter you have in your array:

For example you have 3 - A, 2 - B, 1 - C, 4 - Y, 1 - Z.

1) Then you put each time the lowest one (it is A), you can put.

so you start by :

A

then you can not put A any more so you put B:

AB

then:

ABABACYZ

These works if you have still at least 2 kind of characters. But here you will have still 3 Y.

2) To put the last characters, you just go from your first Y and insert one on 2 in direction of beginning.(I don't know if these is the good way to say that in english).

So ABAYBYAYCYZ.

3) Then you take the subsequence between your Y so YBYAYCY and you sort the letter between the Y :

BAC => ABC

And you arrive at

ABAYAYBYCYZ

which should be the solution of your problem.

To do all this stuff, I think a LinkedList is the best way

I hope it help :)

My idea is the following. With the right implementation it can be almost linear.

First establish a function to check if the solution is even possible. It should be very fast. Something like most frequent letter > 1/2 all letters and take into cosideration if it can be first.

Then while there are still letters remaining take the alphabetically first letter that is not the same as previous, and makes further solution possible.

The correct algorithm would be the following:

  1. Build a histogram of the characters in the input string.
  2. Put the CharacterOccurrences in a PriorityQueue / TreeSet where they're ordered on highest occurrence, lowest alphabetical order
  3. Have an auxiliary variable of type CharacterOccurrence
  4. Loop while the PQ is not empty

    1. Take the head of the PQ and keep it
    2. Add the character of the head to the output
    3. If the auxiliary variable is set => Re-add it to the PQ
    4. Store the kept head in the auxiliary variable with 1 occurrence less unless the occurrence ends up being 0 (then unset it)
  5. if the size of the output == size of the input, it was possible and you have your answer. Else it was impossible.

Complexity is O(N * log(N))

Make a bi directional table of character frequencies: character->count and count->character . Record an optional<Character> which stores the last character (or none of there is none). Store the total number of characters.

If (total number of characters-1)<2*(highest count character count), use the highest count character count character. (otherwise there would be no solution). Fail if this it the last character output.

Otherwise, use the earliest alphabetically that isn't the last character output.

Record the last character output, decrease both the total and used character count.

Loop while we still have characters.

While this question is not quite a duplicate, the part of my answer giving the algorithm for enumerating all permutations with as few adjacent equal letters as possible readily can be adapted to return only the minimum, as its proof of optimality requires that every recursive call yield at least one permutation. The extent of the changes outside of the test code are to try keys in sorted order and to break after the first hit is found. The running time of the code below is polynomial (O(n) if I bothered with better data structures), since unlike its ancestor it does not enumerate all possibilities.

david.pfx's answer hints at the logic: greedily take the least letter that doesn't eliminate all possibilities, but, as he notes, the details are subtle.

from collections import Counter
from itertools import permutations
from operator import itemgetter
from random import randrange


def get_mode(count):
    return max(count.items(), key=itemgetter(1))[0]


def enum2(prefix, x, count, total, mode):
    prefix.append(x)
    count_x = count[x]
    if count_x == 1:
        del count[x]
    else:
        count[x] = count_x - 1
    yield from enum1(prefix, count, total - 1, mode)
    count[x] = count_x
    del prefix[-1]


def enum1(prefix, count, total, mode):
    if total == 0:
        yield tuple(prefix)
        return
    if count[mode] * 2 - 1 >= total and [mode] != prefix[-1:]:
        yield from enum2(prefix, mode, count, total, mode)
    else:
        defect_okay = not prefix or count[prefix[-1]] * 2 > total
        mode = get_mode(count)
        for x in sorted(count.keys()):
            if defect_okay or [x] != prefix[-1:]:
                yield from enum2(prefix, x, count, total, mode)
                break


def enum(seq):
    count = Counter(seq)
    if count:
        yield from enum1([], count, sum(count.values()), get_mode(count))
    else:
        yield ()


def defects(lst):
    return sum(lst[i - 1] == lst[i] for i in range(1, len(lst)))


def test(lst):
    perms = set(permutations(lst))
    opt = min(map(defects, perms))
    slow = min(perm for perm in perms if defects(perm) == opt)
    fast = list(enum(lst))
    assert len(fast) == 1
    fast = min(fast)
    print(lst, fast, slow)
    assert slow == fast


for r in range(10000):
    test([randrange(3) for i in range(randrange(6))])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM