简体   繁体   English

字符串操作算法查找大于原始字符串的字符串

[英]String manipulation algorithm to find string greater than original string

I have few words(strings) like 'hefg','dhck','dkhc','lmno' which is to be converted to new words by swapping some or all the characters such that the new word is greater than the original word lexicographically also the new word is the least of all the words greater than the original word. 我有很少的单词(字符串),如'hefg','dhck','dkhc','lmno' ,它将通过交换部分或全部字符转换为新单词,使新单词大于原始单词的字典顺序新词也是比原词更重要的词。 for eg 'dhck' should output 'dhkc' and not 'kdhc' , 'dchk' or any other. 例如'dhck'应该输出'dhkc'而不是'kdhc''dchk'或任何其他。

i have these inputs 我有这些输入

hefg
dhck
dkhc
fedcbabcd

which should output 哪个应该输出

hegf
dhkc
hcdk
fedcbabdc

I have tried with this code in python it worked for all except 'dkhc' and 'fedcbabcd' . 我在python中试过这个代码,除了'dkhc''fedcbabcd'之外,它适用于所有人。 I have figured out that the first character in case of 'fedcbabcd' is the max so, it is not getting swapped.and Im getting "ValueError: min() arg is an empty sequence" 我已经发现在'fedcbabcd'情况'fedcbabcd'一个字符是最大的,所以它没有被交换。我得到"ValueError: min() arg is an empty sequence"

How can I modify the algorithm To fix the cases? 如何修改算法修复案例?

list1=['d','k','h','c']
list2=[]
maxVal=list1.index(max(list1))
for i in range(maxVal):
    temp=list1[maxVal]
    list1[maxVal]=list1[i-1]
    list1[i-1]=temp
    list2.append(''.join(list1))
print(min(list2))

You can try something like this: 你可以尝试这样的事情:

  • iterate the characters in the string in reverse order 以相反的顺序迭代字符串中的字符
  • keep track of the characters you've already seen, and where you saw them 跟踪你已经看过的角色,以及你看到它们的位置
  • if you've seen a character larger than the curent character, swap it with the smallest larger character 如果你看到的字符比字符大,那就把它换成最小的字符
  • sort all the characters after the that position to get the minimum string 对该位置后的所有字符进行排序以获得最小字符串

Example code: 示例代码:

def next_word(word):
    word = list(word)
    seen = {}
    for i in range(len(word)-1, -1, -1):
        if any(x > word[i] for x in seen):
            x = min(x for x in seen if x > word[i])
            word[i], word[seen[x]] = word[seen[x]], word[i]
            return ''.join(word[:i+1] + sorted(word[i+1:]))
        if word[i] not in seen:
            seen[word[i]] = i

for word in ["hefg", "dhck", "dkhc", "fedcbabcd"]:
    print(word, next_word(word))

Result: 结果:

hefg hegf
dhck dhkc
dkhc hcdk
fedcbabcd fedcbabdc

The max character and its position doesn't influence the algorithm in the general case. 在一般情况下,最大字符及其位置不会影响算法。 For example, for 'fedcbabcd' , you could prepend an a or a z at the beginning of the string and it wouldn't change the fact that you need to swap the final two letters. 例如,对于'fedcbabcd' ,你可以在字符串的开头添加a或一个z ,它不会改变你需要交换最后两个字母的事实。

Consider the input 'dgfecba' . 考虑输入'dgfecba' Here, the output is 'eabcdfg' . 这里的输出是'eabcdfg' Why? 为什么? Notice that the final six letters are sorted in decreasing order, so by changing anything there, you get a smaller string lexicographically, which is no good. 请注意,最后六个字母按递减顺序排序,因此通过更改任何内容,您会按字典顺序获得一个较小的字符串,这是不好的。 It follows that you need to replace the initial 'd' . 因此,您需要替换初始的'd' What should we put in its place? 我们应该把它放在什么位置? We want something greater than 'd' , but as small as possible, so 'e' . 我们想要比'd'更大的东西,但要尽可能小,所以'e' What about the remaining six letters? 剩下的六封信怎么样? Again, we want a string that's as small as possible, so we sort the letters lexicographically: 'eabcdfg' . 同样,我们想要一个尽可能小的字符串,因此我们按字典顺序对字母进行排序: 'eabcdfg'

So the algorithm is: 所以算法是:

  • start at the back of the string (right end); 从字符串的后面开始(右端);
  • go left while the symbols keep increasing; 在符号不断增加时向左走;
  • let i be the rightmost position where s[i] < s[i + 1] ; i成为s[i] < s[i + 1]的最右边的位置; in our case, that's i = 0; 在我们的例子中,那是i = 0;
  • leave the symbols on position 0, 1, ..., i - 1 untouched; 将符号留在位置0,1,..., i - 1不变;
  • find the position among i+1 ... n-1 containing the least symbol that's greater than s[i] ; 找到i+1 ... n-1包含大于s[i]的最小符号的位置; call this position j ; 叫这个职位j ; in our case, j = 3; 在我们的例子中, j = 3;
  • swap s[i] and s[j] ; 交换s[i]s[j] ; in our case, we obtain 'egfdcba' ; 在我们的例子中,我们获得'egfdcba' ;
  • reverse the string s[i+1] ... s[n-1] ; 反转字符串s[i+1] ... s[n-1] ; in our case, we obtain 'eabcdfg' . 在我们的例子中,我们获得'eabcdfg'

Your problem can we reworded as finding the next lexicographical permutation of a string . 您的问题可以重新编写为查找字符串的下一个字典排列

The algorithm in the above link is described as follow: 上述链接中的算法描述如下:

1) Find the longest non-increasing suffix 1)找到最长的非增加后缀

2) The number left of the suffix is our pivot 2)后缀左边的数字是我们的支点

3) Find the right-most successor of the pivot in the suffix 3)在后缀中找到最右侧的枢轴的后继者

4) Swap the successor and the pivot 4)交换后继者和枢轴

5) Reverse the suffix 5)反转后缀

The above algorithm is especially interesting because it is O(n) . 上述算法特别有趣,因为它是O(n)

Code

def next_lexicographical(word):
    word = list(word)

    # Find the pivot and the successor
    pivot = next(i for i in range(len(word) - 2, -1, -1) if word[i] < word[i+1])
    successor = next(i for i in range(len(word) - 1, pivot, -1) if word[i] > word[pivot])

    # Swap the pivot and the successor
    word[pivot], word[successor] = word[successor], word[pivot]

    # Reverse the suffix
    word[pivot+1:] = word[-1:pivot:-1]

    # Reform the word and return it
    return ''.join(word)

The above algorithm will raise a StopIteration exception if the word is already the last lexicographical permutation. 如果该单词已经是最后一个词典排列,则上述算法将引发StopIteration异常。

Example

words = [
    'hefg',
    'dhck',
    'dkhc',
    'fedcbabcd'
]

for word in words:
    print(next_lexicographical(word))

Output 产量

hegf
dhkc
hcdk
fedcbabdc

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM