简体   繁体   English

你怎么知道一个阵列在另一个阵列中重复多少次?

[英]How would you find how many times one array is repeated in another one?

For example, if you were given {1,2} as the small array and {1,2,3,4,1,2,1,3} as the big one, then it would return 2. 例如,如果您将{1,2}作为小数组而{1,2,3,4,1,2,1,3}作为大数组,那么它将返回2。

This is probably horribly incorrect: 这可能非常不正确:

public static int timesOccur(int[] small, int big[]) {
    int sum= 0; 
    for (int i=0; i<small.length; i++){
        int currentSum = 0;
        for (int j=0; j<big.length; j++){       
            if (small[i] == big[j]){
                currentSum ++;
            }   
            sum= currentSum ;
        }
    }
    return sum;
}

As @AndyTurner mentioned, your task can be reduced to the set of well-known string matching algorithms. 正如@AndyTurner所提到的,您的任务可以简化为一组众所周知的字符串匹配算法。

As I can understand you want solution faster than O(n * m). 据我所知,你想要比O(n * m)更快的解决方案。

There are two main approaches. 主要有两种方法。 First involves preprocessing text (long array), second involves preprocessing search pattern (small array). 首先涉及预处理文本(长数组),第二涉及预处理搜索模式(小数组)。

  1. Preprocessing text. 预处理文本。 By this I mean creating suffix array or LCP from your longer array. 我的意思是从更长的数组创建后缀数组LCP Having this data structure constructed you can perform a binary search to find your your substring. 构建此数据结构后,您可以执行二进制搜索以查找您的子字符串。 The most efficient time you can achieve is O(n) to build LCP and O(m + log n) to perform the search. 您可以实现的最有效时间是O(n)构建LCP和O(m + log n)来执行搜索。 So overall time is O(n + m). 所以整体时间是O(n + m)。

  2. Preprocessing pattern. 预处理模式。 This means construction DFA from the pattern. 这意味着从模式构建DFA。 Having DFA constructed it takes one traversal of the string (long array) to find all occurrences of substring (linear time). 构造DFA后,它需要遍历一个字符串(长数组)来查找所有出现的子字符串(线性时间)。 The hardest part here is to construct the DFA. 这里最难的部分是构建DFA。 Knuth-Morris-Pratt does this in O(m) time, so overall algorithm running time will be O(m + n). Knuth-Morris-Pratt在O(m)时间内完成此操作,因此整体算法运行时间将为O(m + n)。 Actually KMP algorithm is most probably the best available solution for this task in terms of efficiency and implementation complexity. 实际上,就效率和实现复杂性而言,KMP算法很可能是此任务的最佳可用解决方案。 Check @JuanLopes 's answer for concrete implementation. 检查@JuanLopes的具体实施答案。

Also you can consider optimized bruteforce, for example Boyer-Moore , it is good for practical cases, but it has O(n * m) running time in worst case. 您也可以考虑优化的强力,例如Boyer-Moore ,它适用于实际情况,但在最坏的情况下它具有O(n * m)运行时间。

UPD: UPD:

In case you don't need fast approaches, I corrected your code from description: 如果您不需要快速方法,我会从描述中更正您的代码:

    public static int timesOccur(int[] small, int big[]) {
        int sum = 0;
        for (int i = 0; i < big.length - small.length + 1; i++) {
            int j = 0;
            while (j < small.length && small[j] == big[i + j]) {
                j++;
            }
            if (j == small.length) {
                sum++;
            }
        }
        return sum;
    }

Pay attention on the inner while loop. 注意内在的while循环。 It stops as soon as elements don't match. 一旦元素不匹配就会停止。 It's important optimization, as it makes running time almost linear for best cases. 这是重要的优化,因为它使运行时间几乎与最佳情况呈线性关系。

upd2: inner loop explanation. upd2:内循环解释。

The purpose of inner loop is to find out if smaller array matches bigger array starting from position i . 内循环的目的是找出较小的数组是否匹配从位置i开始的较大数组。 To perform that check index j is iterated from 0 to length of smaller array, comparing the element j of the smaller array with the corresponding element i + j of the bigger array. 为了执行该检查,索引j从0重复到较小数组的长度,将较小数组的元素j与较大数组的对应元素i + j进行比较。 Loop proceeds when both conditions are true at the same time: j < small.length and corresponding elements of two arrays match. 当两个条件同时为真时循环继续: j < small.length 并且两个数组的对应元素匹配。

So loop stops in two situations: 所以循环在两种情况下停止:

  1. j < small.length is false . j < small.length假的 This means that j==small.length . 这意味着j==small.length Also it means that for all j=0..small.length-1 elements of the two arrays matched (otherwise loop would break earlier, see (2) below). 此外,它意味着对于匹配的两个数组的所有j=0..small.length-1元素(否则循环会先破坏,请参见下面的(2))。
  2. small[j] == big[i + j] is false . small[j] == big[i + j]假的 This means that match was not found. 这意味着找不到匹配项。 In this case loop will break before j reaches small.length 在这种情况下,循环将在j达到small.length之前small.length

After the loop it's sufficient to check whether j==small.length to know which condition made loop to stop and hence know whether match was found or not for current position i . 在循环之后,足以检查j==small.length是否足以知道哪个条件使循环停止并因此知道当前位置i是否找到匹配。

This is a simple subarray matching problem. 这是一个简单的子阵列匹配问题。 In Java you can use Collections.indexOfSublist , but you would have to box all the integers in your array. 在Java中,您可以使用Collections.indexOfSublist ,但您必须将数组中的所有整数都包装起来。 An option is to implement your own array matching algorithm. 一个选项是实现自己的数组匹配算法。 There are several options, most string searching algorithms can be adapted to this task. 有几种选择,大多数字符串搜索算法可以适应这项任务。

Here is an optimized version based on the KMP algorithm . 这是基于KMP算法的优化版本。 In the worst case it will be O(n + m), which is better than the trivial algorithm. 在最坏的情况下,它将是O(n + m),这比普通算法更好。 But it has the downside of requiring extra space to compute the failure function ( F ). 但它有一个缺点,需要额外的空间来计算失效函数( F )。

public class Main {
    public static class KMP {
        private final int F[];
        private final int[] needle;

        public KMP(int[] needle) {
            this.needle = needle;
            this.F = new int[needle.length + 1];

            F[0] = 0;
            F[1] = 0;
            int i = 1, j = 0;
            while (i < needle.length) {
                if (needle[i] == needle[j])
                    F[++i] = ++j;
                else if (j == 0)
                    F[++i] = 0;
                else
                    j = F[j];
            }
        }

        public int countAt(int[] haystack) {
            int count = 0;
            int i = 0, j = 0;
            int n = haystack.length, m = needle.length;

            while (i - j <= n - m) {
                while (j < m) {
                    if (needle[j] == haystack[i]) {
                        i++;
                        j++;
                    } else break;
                }
                if (j == m) count++;
                else if (j == 0) i++;
                j = F[j];
            }
            return count;
        }
    }

    public static void main(String[] args) {
        System.out.println(new KMP(new int[]{1, 2}).countAt(new int[]{1, 2, 3, 4, 1, 2, 1, 3}));
        System.out.println(new KMP(new int[]{1, 1}).countAt(new int[]{1, 1, 1}));
    }
}

Rather than posting a solution I'll provide some hints to get your moving. 而不是发布解决方案,我会提供一些提示,让你感动。

It's worth breaking the problem down into smaller pieces, in general your algorithm should look like: 值得将问题分解为更小的部分,通常您的算法应该如下所示:

for each position in the big array
     check if the small array matches that position
         if it does, increment your counter

The smaller piece is then checking if the small array matches a given position 然后较小的部分检查小阵列是否匹配给定位置

first check if there's enough room to fit the smaller array
    if not then the arrays don't match
otherwise for each position in the smaller array
    check if the values in the arrays match
        if not then the arrays don't match
if you get to the end of the smaller array and they have all matched
    then the arrays match

Though not thoroughly tested I believe this is a solution to your problem. 虽然没有经过全面测试,但我相信这是解决您问题的方法。 I would highly recommend using Sprinters pseudocode to try and figure this out yourself before using this. 我强烈建议使用Sprinters伪代码在使用之前尝试自己解决这个问题。

public static void main(String[] args)
{
    int[] smallArray = {1,1};
    int[] bigArray = {1,1,1};
    int sum = 0;

    for(int i = 0; i < bigArray.length; i++)
    {
        boolean flag = true;
        if(bigArray[i] == smallArray[0])
        {
            for(int x = 0; x < smallArray.length; x++)
            {
                if(i + x >= bigArray.length)
                    flag = false;
                else if(bigArray[i + x] != smallArray[x])
                    flag = false;

            }

            if(flag)
                sum += 1;
        }
    }

    System.out.println(sum);

}

} }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM