简体   繁体   中英

String-matching first n letters of two strings

So for a problem I'm facing I would like to know how long a sequence (starting from index 0) two strings are 'the same' - I think it'd be clearer to just give an example;

  • I would like the method to return 4 if the two strings are "Yellowstone" and "Yelling" - meaning, the first 4 characters of the two strings match ("Yell")

Is there any more (time-)efficient way to go about this than to just iterate over the two words? Could I make use of some built-in method of some sort? (For my task I want to avoid importing any custom libs)

I think the fastest approach would be to use Binaray Search , which will give you O(logn) complexity instead of O(n). Here n is the length of smallest string.

The approach is simple in binary search. Look for similarity end for the index character in both the strings. For example if i is your index then check i+1 for dis-similarity character where character at i index is similar. And if that is the case break, return i as your answer. Or else keep on searching in sub-scope.

Edit

Adding function for better understanding.

int lengthOfFirstSimilarCharacters(String str1, String str2) {
    int strlen1 = str1.length();
    int strlen2 = str2.length();
    if(strlen1 > strlen2){
        return lengthOfFirstSimilarCharacters(str2,str1);
    }
    int i = 0;
    int j = strlen1-1;
    while(i<=j){
        int mid = i + (j-i)/2;
        if(str1.charAt(mid) == str2.charAt(mid)) {
            if(mid+1<strlen1 && str1.charAt(mid+1) != str2.charAt(mid+1)){
                return mid+1;
            }
            i = mid+1;
        }else{
            j = mid-1;
        }
    }
    return i;
}

You dont have to iterate through both texts. Iterate through the smaller one and compare character at same index. break as and when you find a mismatch

String a ="Yellow";
String b= "Yelling";
String smaller = (a.length < b.length) ? a:b;
int ret =0;
for (index based on smaller ){
  compare character using charAt and if matching ret++, else break;
}
return ret;

//use charAt along with equalsIgnoreCase ifu want it to be case insensitive. String.valueOf(a.charAt(index)).equalsIgnoreCase(String.valueOf(b.charAt(index)))

Correction:

The answer by Sachin Chauhan is indeed correct and better at runtime (ie using binary search to search for the first difference).

I will leave my answer to allow for a simpler solution programmer-time, for the cases where the length is of no great influence (ie relatively short strings), but a simple solution would be preferable.

Here is the original answer:

As it's a simple loop, I doubt any inbuilt method will be much of a "programmer"-time improvement (and definitely not much of run-time improvement worth to mention).

For the record, I know of no such Java method (perhaps some external library, but you've stated you'd prefer to avoid them).

Reference code would be something along these lines, I'd imagine:

public int longestCommonPrefixLength(String s1, String s2) {

    if (s1 == null || s1.length() == 0 || s2 == null || s2.length() == 0) {
        return 0;
    }

    int commonPrefixLength = 0;

    for (int i = 0; i < Math.min(s1.length(), s2.length()); i++) {
        if (s1.charAt(i) == s2.charAt(i)) {
            commonPrefixLength++;
        } else {
            break;
        }
    }

    return commonPrefixLength;
}

As we see, with all the verbosity of Java and my "clarity" style, it's still just 18 lines of code. :)

Relaxing some clarity, you can even shorten the for to:

for (int i = 0; i < Math.min(s1.length(), s2.length()) && s1.charAt(i) == s2.charAt(i); i++, commonPrefixLength++);

for 6 lines less.

To take it to the (correct) extreme:

public int longestCommonPrefixLength2(String s1, String s2) {
    if (s1 == null || s1.length() == 0 || s2 == null || s2.length() == 0) return 0;
    int i = 0;
    for (; i < Math.min(s1.length(), s2.length()) && s1.charAt(i) == s2.charAt(i); i++);
    return i;
}

6 LOC :)

Something curious, by the way:

String class has boolean regionMatches(int toffset, String other, int ooffset, int len) method (which does internally pretty much the above up to a given len ) - you could also iteratively increase len until it no longer returns true, but that would not be anywhere near same efficiency, of course.

Using Streams

    String s1 = "Yellow";
    String s2 = "Yelling";
    int limit = (s1.length() > s2.length() ? s2.length() : s1.length()) - 1;
    int ret = IntStream.range(0, limit)
                .filter(i -> s1.charAt(i) != s2.charAt(i))
                .findFirst().orElse(-1);
    //-1 if the Strings are the same.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM