简体   繁体   English

两个字符串之间的子字符串差异

[英]substring difference between two strings

Given two strings of length n,P = p1...pn and Q = q1...qn, we define M(i, j, k) as the number of mismatches between pi...pi+k-1 and qj..qj+k-1. 给定两个长度为n,P = p1 ... pn和Q = q1 ... qn的字符串,我们将M(i,j,k)定义为pi ... pi + k-1和qj之间的不匹配数..qj + k-1。 That is in set notation, M(i, j, k) refers to the size of the set { 0<=x<k | pi+x not equal to qj+x| } 也就是说,M(i,j,k)表示集合的大小{ 0<=x<k | pi+x not equal to qj+x| } { 0<=x<k | pi+x not equal to qj+x| } { 0<=x<k | pi+x not equal to qj+x| } . { 0<=x<k | pi+x not equal to qj+x| }

Given an integer K, your task is to find the maximum length L such that there exists pair of indices (i,j) for which we have M(i, j, L) <= K . 给定一个整数K,您的任务是找到最大长度L,以使存在一对索引(i,j),我们有M(i, j, L) <= K Of course, we should also have i+L-1 <=n and j+L-1 <=n . 当然,我们也应该让i+L-1 <=nj+L-1 <=n Input 输入项

First line of input contains a single integer T (1 <=T <=10). 输入的第一行包含一个整数T(1 <= T <= 10)。 T test cases follow. 随后是T测试用例。 Each test case consists of an integer K and two strings P and Q separated by a single space. 每个测试用例由一个整数K和两个用单个空格分隔的字符串P和Q组成。 Output 输出量

For each test case output a single integer L which is the maximum value for which there exists pair of indices (i,j) such that M(i, j, L) <=K. 对于每个测试用例,输出单个整数L,该整数是存在一对索引(i,j)的最大值,使得M(i,j,L)<= K。

Constraints 约束条件

0 <= K <= length of the string P Both P & Q would have the same length The size of each of the string would be at the max 1500 All characters in P & Q are lower-case English letters. 0 <= K <=字符串P的长度P和Q的长度相同。每个字符串的大小最大为1500。P&Q中的所有字符均为小写英文字母。

Sample Input 样本输入

3
2 tabriz torino
0 abacba abcaba
3 helloworld yellomarin

Sample Output 样本输出

4
3
8 

Explanation: First test-case: If we take "briz" from the first string, and "orin" from the second string, then the number of mismatches between these two substrings is equal to 2, and the length of these substrings are 4. That's we have chosen i=3, j=2, L=4, and we have M(3,2,4) = 2. 说明:第一个测试用例:如果我们从第一个字符串中获取“ briz”,从第二个字符串中获取“ orin”,则这两个子字符串之间的不匹配数等于2,并且这些子字符串的长度为4。那就是我们选择了i = 3,j = 2,L = 4,并且我们有M(3,2,4)= 2。

Second test-case: Since K=0, we should find the longest common substring for the given input strings. 第二个测试用例:由于K = 0,我们应该找到给定输入字符串的最长公共子字符串。 We can choose "aba" as the result, and we don't have longer common substring between two strings. 我们可以选择“ aba”作为结果,并且两个字符串之间没有更长的公共子字符串。 So, the answer is 3 for this test-case. 因此,此测试用例的答案是3。 That's we have chosen i=1, j=4, and L=3, and we have M(1,4,3)=0. 那就是我们选择了i = 1,j = 4和L = 3,并且我们有M(1,4,3)= 0。

Third test-case: We can choose "hellowor" from first string and "yellomar" from the second string. 第三个测试用例:我们可以从第一个字符串中选择“ hellowor”,从第二个字符串中选择“ yellomar”。 So, we have chosen i=1, j=1, and L=8, and we have M(1,1,8)=3. 因此,我们选择了i = 1,j = 1和L = 8,并且我们有M(1,1,8)= 3。 Of course we can also choose i=2, j=2, and L=8 and we still have M(2,2,8)=3. 当然,我们也可以选择i = 2,j = 2和L = 8,而我们仍然有M(2,2,8)= 3。

here is my implementation 这是我的实现

import java.io.*;
import java.util.*;

class Solution {

    public static int mismatch(String a, String b, int ii, int jj, int xx) {
        int i, j = 0;
        for (i = 0; i < xx; i++) {
            if (a.charAt(ii) != b.charAt(jj)) {
                j++;
            }
            ii++;
            jj++;
        }
        return j;
    }

    public static boolean find(int x, String a, String b, int kx) {
        int nn = a.length();
        for (int i = 0; i <= (nn - x); i++) {
            for (int j = 0; j <= (nn - x); j++) {
                int k;
                k = mismatch(a, b, i, j, x);
                if (k == kx) {
                    return true;
                }
            }
        }
        return false;
    }

    public static void main(String args[]) throws IOException {
        Scanner scanner = new Scanner(System.in);
        int t = scanner.nextInt();
        while (t > 0) {
            int k, n;
            String a, b;
            k = scanner.nextInt();
            a = scanner.next();
            b = scanner.next();
            n = a.length();
            int i = (n + k) / 2;
            int st = k, en = n
                while (i != k || i != n) {
                boolean ch = false, chh = false;
                ch = find(i, a, b, k);
                if (i != n) {
                    chh = find(i + 1, a, b, k);
                }
                if (i == n && ch == true) {
                    System.out.println(i);
                    break;
                }
                if (ch == true && chh == false) {
                    System.out.println(i);
                    break;
                }
                if (ch) {
                    st = i;
                    i = (i + en + 1) / 2;
                } else {
                    en = i;
                    i = (st + i) / 2;
                }
            }
            t--;
        }
    }
}

the above implementation is taking 5.1 sec for input 0f 1500 string length.But maximum time limit in java is 5sec.if any one can improve this code,please kindly share yor thougths 上面的实现输入0f 1500字符串长度需要5.1秒。但是java中的最大时间限制是5sec。如果任何人可以改进此代码,请分享您的看法

Your code doesn't take 5.1s on the site. 您的代码在网站上不需要5.1秒。 They stop running your code as soon as it exceeds the time limit. 一旦超过时间限制,它们就会停止运行您的代码。 Your code might be taking even minutes. 您的代码可能要花几分钟。 So, even if you optimize it with this algorithm you will again get 5.1s in details section. 因此,即使您使用此算法对其进行优化,您也将在详细信息部分再次获得5.1s。 So work on your algo, not optimization! 因此,根据您的算法进行工作,而不是优化!

You could make a boolean array compare[n,n] , for which compare[i,j]=(a[i]==b[j]). 您可以创建一个布尔数组compare[n,n] ,为此compare[i,j]=(a[i]==b[j]). Later use it instead of making repeating comparisons. 以后使用它而不是进行重复比较。 You'll have incomparably less comparisons and addressing. 您将拥有比以往更少的比较和寻址。

public static int mismatch(String a, String b, int ii, int jj, int xx) {
    int i, j = 0;
    for (i = 0; i < xx; i++) {
        if (! compare[ii,jj]) {
            j++;
        }
        ii++;
        jj++;
    }
    return j;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM