简体   繁体   English

两个字符串的字符串匹配的前n个字母

[英]String-matching first n letters of two strings

So for a problem I'm facing I would like to know how long a sequence (starting from index 0) two strings are 'the same' - I think it'd be clearer to just give an example; 所以对于一个我面临的问题,我想知道一个序列(从索引0开始)两个字符串“相同”有多长时间-我想举一个例子会更清楚。

  • I would like the method to return 4 if the two strings are "Yellowstone" and "Yelling" - meaning, the first 4 characters of the two strings match ("Yell") 如果两个字符串分别为“ Yellowstone”和“ Yelling”,则我希望该方法返回4-这意味着两个字符串的前4个字符匹配(“ Yell”)

Is there any more (time-)efficient way to go about this than to just iterate over the two words? 除了仅迭代两个单词之外,还有其他(省时)高效的方法吗? Could I make use of some built-in method of some sort? 我可以利用某种内置方法吗? (For my task I want to avoid importing any custom libs) (对于我的任务,我想避免导入任何自定义库)

I think the fastest approach would be to use Binaray Search , which will give you O(logn) complexity instead of O(n). 我认为最快的方法是使用Binaray Search ,它将为您提供O(logn)复杂度,而不是O(n)。 Here n is the length of smallest string. 这里n是最小字符串的长度。

The approach is simple in binary search. 该方法在二进制搜索中很简单。 Look for similarity end for the index character in both the strings. 在两个字符串中寻找索引字符的相似性结尾。 For example if i is your index then check i+1 for dis-similarity character where character at i index is similar. 例如,如果i是您的索引,则检查i + 1以查找不相似字符,其中i索引处的字符相似。 And if that is the case break, return i as your answer. 如果是这种情况,请返回i作为您的答案。 Or else keep on searching in sub-scope. 否则继续在子范围内搜索。

Edit 编辑

Adding function for better understanding. 增加功能以便更好地理解。

int lengthOfFirstSimilarCharacters(String str1, String str2) {
    int strlen1 = str1.length();
    int strlen2 = str2.length();
    if(strlen1 > strlen2){
        return lengthOfFirstSimilarCharacters(str2,str1);
    }
    int i = 0;
    int j = strlen1-1;
    while(i<=j){
        int mid = i + (j-i)/2;
        if(str1.charAt(mid) == str2.charAt(mid)) {
            if(mid+1<strlen1 && str1.charAt(mid+1) != str2.charAt(mid+1)){
                return mid+1;
            }
            i = mid+1;
        }else{
            j = mid-1;
        }
    }
    return i;
}

You dont have to iterate through both texts. 您不必遍历两个文本。 Iterate through the smaller one and compare character at same index. 遍历较小的那个并比较相同索引处的字符。 break as and when you find a mismatch 当发现不匹配时中断

String a ="Yellow";
String b= "Yelling";
String smaller = (a.length < b.length) ? a:b;
int ret =0;
for (index based on smaller ){
  compare character using charAt and if matching ret++, else break;
}
return ret;

//use charAt along with equalsIgnoreCase ifu want it to be case insensitive. //如果希望不区分大小写,请使用charAt和equalsIgnoreCase一起使用。 String.valueOf(a.charAt(index)).equalsIgnoreCase(String.valueOf(b.charAt(index))) String.valueOf(a.charAt(index))。equalsIgnoreCase(String.valueOf(b.charAt(index)))

Correction: 更正:

The answer by Sachin Chauhan is indeed correct and better at runtime (ie using binary search to search for the first difference). Sachin Chauhan的答案确实是正确的,并且在运行时更好(即使用二进制搜索来搜索第一个差异)。

I will leave my answer to allow for a simpler solution programmer-time, for the cases where the length is of no great influence (ie relatively short strings), but a simple solution would be preferable. 对于长度没有太大影响的情况(即相对较短的字符串),我将保留我的答案以允许使用更简单的解决方案程序员时间,但是更可取的是采用简单的解决方案。

Here is the original answer: 这是原始答案:

As it's a simple loop, I doubt any inbuilt method will be much of a "programmer"-time improvement (and definitely not much of run-time improvement worth to mention). 因为这是一个简单的循环,所以我怀疑任何内置方法都不会在很大程度上改善“程序员”的时间(并且绝对不能提及很多运行时的改进)。

For the record, I know of no such Java method (perhaps some external library, but you've stated you'd prefer to avoid them). 作为记录,我不知道有没有这样的Java方法(也许有一些外部库,但是您已经声明希望避免使用它们)。

Reference code would be something along these lines, I'd imagine: 我想,参考代码将遵循这些思路:

public int longestCommonPrefixLength(String s1, String s2) {

    if (s1 == null || s1.length() == 0 || s2 == null || s2.length() == 0) {
        return 0;
    }

    int commonPrefixLength = 0;

    for (int i = 0; i < Math.min(s1.length(), s2.length()); i++) {
        if (s1.charAt(i) == s2.charAt(i)) {
            commonPrefixLength++;
        } else {
            break;
        }
    }

    return commonPrefixLength;
}

As we see, with all the verbosity of Java and my "clarity" style, it's still just 18 lines of code. 正如我们所看到的那样,尽管Java具有所有的冗长性和我的“清晰”风格,但仍然只有18行代码。 :) :)

Relaxing some clarity, you can even shorten the for to: 放宽一些清晰度,您甚至可以将for缩短for

for (int i = 0; i < Math.min(s1.length(), s2.length()) && s1.charAt(i) == s2.charAt(i); i++, commonPrefixLength++);

for 6 lines less. 少6行。

To take it to the (correct) extreme: 使其达到(正确)极限:

public int longestCommonPrefixLength2(String s1, String s2) {
    if (s1 == null || s1.length() == 0 || s2 == null || s2.length() == 0) return 0;
    int i = 0;
    for (; i < Math.min(s1.length(), s2.length()) && s1.charAt(i) == s2.charAt(i); i++);
    return i;
}

6 LOC :) 6 LOC :)

Something curious, by the way: 顺便说一句:

String class has boolean regionMatches(int toffset, String other, int ooffset, int len) method (which does internally pretty much the above up to a given len ) - you could also iteratively increase len until it no longer returns true, but that would not be anywhere near same efficiency, of course. String类具有boolean regionMatches(int toffset, String other, int ooffset, int len)方法(在给定的len内,在内部几乎可以完成上述工作)-您也可以迭代地增加len直到它不再返回true为止,但这会当然,效率不可能接近任何地方。

Using Streams 使用流

    String s1 = "Yellow";
    String s2 = "Yelling";
    int limit = (s1.length() > s2.length() ? s2.length() : s1.length()) - 1;
    int ret = IntStream.range(0, limit)
                .filter(i -> s1.charAt(i) != s2.charAt(i))
                .findFirst().orElse(-1);
    //-1 if the Strings are the same.

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM