简体   繁体   English

检查两个排序的字符串在 O(log n) 时间内是否相等

[英]Check if two sorted strings are equal in O(log n) time

I need to write a Python function which takes two sorted strings (the characters in each string are in increasing alphabetical order) containing only lowercase letters, and checks whether or not the strings are equal.我需要编写一个 Python 函数,它接受两个只包含小写字母的排序字符串(每个字符串中的字符按字母顺序递增),并检查字符串是否相等。 The function's time complexity needs to be O(log n) , where n is the length of each string.该函数的时间复杂度需要为O(log n) ,其中n是每个字符串的长度。

I can't figure out how to check it without comparing each character in the first string with the parallel character of the second string.如果不将第一个字符串中的每个字符与第二个字符串的并行字符进行比较,我无法弄清楚如何检查它。

This is, in fact, possible in O(log n) time in the worst case, since the strings are formed from an alphabet of constant size.事实上,在最坏的情况下,这在 O(log n) 时间内是可能的,因为字符串是由恒定大小的字母表形成的。

You can do 26 binary searches on each string to find the left-most occurrence of each letter.您可以对每个字符串进行 26 次二分搜索,以找到每个字母出现在最左边的位置。 If the strings are equal, then all 26 binary searches will give the same results;如果字符串相等,则所有 26 次二分查找都会给出相同的结果; either that the letter exists in neither string, or that its left-most occurrence is the same in both strings.要么该字母在两个字符串中都不存在,要么在两个字符串中它最左边的出现次数相同。

Conversely, if all of the binary searches give the same result, then the strings must be equal, because (1) the alphabet is fixed, (2) the indices of the left-most occurrences determine the frequency of each letter in the string, and (3) the strings are sorted, so the letter frequencies uniquely determine the string.相反,如果所有的二分查找都给出相同的结果,那么字符串必须相等,因为(1)字母表是固定的,(2)最左边出现的索引决定了字符串中每个字母的频率, (3) 对字符串进行排序,因此字母频率唯一地确定了字符串。

I'm assuming here that the strings have the same length.我在这里假设字符串具有相同的长度。 If they might not, then check that first and return False if the lengths are different.如果它们可能不是,则首先检查并在长度不同时返回False Getting the length of a string takes O(1) time.获取字符串的长度需要 O(1) 时间。


As @wim notes in the comments, this solution cannot be generalised to lists of numbers;正如@wim 在评论中指出的那样,此解决方案不能推广到数字列表; it specifically only works with strings.它专门只适用于字符串。 When you have an algorithmic problem involving strings, the alphabet size is usually a constant, and this fact can often be exploited to achieve a better time complexity than would otherwise be possible.当您遇到涉及字符串的算法问题时,字母表的大小通常是一个常数,并且通常可以利用这一事实来实现比其他方式更好的时间复杂度。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM