[英]How would you find how many times one array is repeated in another one?
For example, if you were given {1,2} as the small array and {1,2,3,4,1,2,1,3} as the big one, then it would return 2. 例如,如果您将{1,2}作为小数组而{1,2,3,4,1,2,1,3}作为大数组,那么它将返回2。
This is probably horribly incorrect: 这可能非常不正确:
public static int timesOccur(int[] small, int big[]) {
int sum= 0;
for (int i=0; i<small.length; i++){
int currentSum = 0;
for (int j=0; j<big.length; j++){
if (small[i] == big[j]){
currentSum ++;
}
sum= currentSum ;
}
}
return sum;
}
As @AndyTurner mentioned, your task can be reduced to the set of well-known string matching algorithms. 正如@AndyTurner所提到的,您的任务可以简化为一组众所周知的字符串匹配算法。
As I can understand you want solution faster than O(n * m). 据我所知,你想要比O(n * m)更快的解决方案。
There are two main approaches. 主要有两种方法。 First involves preprocessing text (long array), second involves preprocessing search pattern (small array).
首先涉及预处理文本(长数组),第二涉及预处理搜索模式(小数组)。
Preprocessing text. 预处理文本。 By this I mean creating suffix array or LCP from your longer array.
我的意思是从更长的数组创建后缀数组或LCP 。 Having this data structure constructed you can perform a binary search to find your your substring.
构建此数据结构后,您可以执行二进制搜索以查找您的子字符串。 The most efficient time you can achieve is O(n) to build LCP and O(m + log n) to perform the search.
您可以实现的最有效时间是O(n)构建LCP和O(m + log n)来执行搜索。 So overall time is O(n + m).
所以整体时间是O(n + m)。
Preprocessing pattern. 预处理模式。 This means construction DFA from the pattern.
这意味着从模式构建DFA。 Having DFA constructed it takes one traversal of the string (long array) to find all occurrences of substring (linear time).
构造DFA后,它需要遍历一个字符串(长数组)来查找所有出现的子字符串(线性时间)。 The hardest part here is to construct the DFA.
这里最难的部分是构建DFA。 Knuth-Morris-Pratt does this in O(m) time, so overall algorithm running time will be O(m + n).
Knuth-Morris-Pratt在O(m)时间内完成此操作,因此整体算法运行时间将为O(m + n)。 Actually KMP algorithm is most probably the best available solution for this task in terms of efficiency and implementation complexity.
实际上,就效率和实现复杂性而言,KMP算法很可能是此任务的最佳可用解决方案。 Check @JuanLopes 's answer for concrete implementation.
检查@JuanLopes的具体实施答案。
Also you can consider optimized bruteforce, for example Boyer-Moore , it is good for practical cases, but it has O(n * m) running time in worst case. 您也可以考虑优化的强力,例如Boyer-Moore ,它适用于实际情况,但在最坏的情况下它具有O(n * m)运行时间。
In case you don't need fast approaches, I corrected your code from description: 如果您不需要快速方法,我会从描述中更正您的代码:
public static int timesOccur(int[] small, int big[]) {
int sum = 0;
for (int i = 0; i < big.length - small.length + 1; i++) {
int j = 0;
while (j < small.length && small[j] == big[i + j]) {
j++;
}
if (j == small.length) {
sum++;
}
}
return sum;
}
Pay attention on the inner while loop. 注意内在的while循环。 It stops as soon as elements don't match.
一旦元素不匹配就会停止。 It's important optimization, as it makes running time almost linear for best cases.
这是重要的优化,因为它使运行时间几乎与最佳情况呈线性关系。
upd2: inner loop explanation. upd2:内循环解释。
The purpose of inner loop is to find out if smaller array matches bigger array starting from position i
. 内循环的目的是找出较小的数组是否匹配从位置
i
开始的较大数组。 To perform that check index j
is iterated from 0 to length of smaller array, comparing the element j
of the smaller array with the corresponding element i + j
of the bigger array. 为了执行该检查,索引
j
从0重复到较小数组的长度,将较小数组的元素j
与较大数组的对应元素i + j
进行比较。 Loop proceeds when both conditions are true at the same time: j < small.length
and corresponding elements of two arrays match. 当两个条件同时为真时循环继续:
j < small.length
并且两个数组的对应元素匹配。
So loop stops in two situations: 所以循环在两种情况下停止:
j < small.length
is false . j < small.length
是假的 。 This means that j==small.length
. j==small.length
。 Also it means that for all j=0..small.length-1
elements of the two arrays matched (otherwise loop would break earlier, see (2) below). j=0..small.length-1
元素(否则循环会先破坏,请参见下面的(2))。 small[j] == big[i + j]
is false . small[j] == big[i + j]
是假的 。 This means that match was not found. j
reaches small.length
j
达到small.length
之前small.length
After the loop it's sufficient to check whether j==small.length
to know which condition made loop to stop and hence know whether match was found or not for current position i
. 在循环之后,足以检查
j==small.length
是否足以知道哪个条件使循环停止并因此知道当前位置i
是否找到匹配。
This is a simple subarray matching problem. 这是一个简单的子阵列匹配问题。 In Java you can use
Collections.indexOfSublist
, but you would have to box all the integers in your array. 在Java中,您可以使用
Collections.indexOfSublist
,但您必须将数组中的所有整数都包装起来。 An option is to implement your own array matching algorithm. 一个选项是实现自己的数组匹配算法。 There are several options, most string searching algorithms can be adapted to this task.
有几种选择,大多数字符串搜索算法可以适应这项任务。
Here is an optimized version based on the KMP algorithm . 这是基于KMP算法的优化版本。 In the worst case it will be O(n + m), which is better than the trivial algorithm.
在最坏的情况下,它将是O(n + m),这比普通算法更好。 But it has the downside of requiring extra space to compute the failure function (
F
). 但它有一个缺点,需要额外的空间来计算失效函数(
F
)。
public class Main {
public static class KMP {
private final int F[];
private final int[] needle;
public KMP(int[] needle) {
this.needle = needle;
this.F = new int[needle.length + 1];
F[0] = 0;
F[1] = 0;
int i = 1, j = 0;
while (i < needle.length) {
if (needle[i] == needle[j])
F[++i] = ++j;
else if (j == 0)
F[++i] = 0;
else
j = F[j];
}
}
public int countAt(int[] haystack) {
int count = 0;
int i = 0, j = 0;
int n = haystack.length, m = needle.length;
while (i - j <= n - m) {
while (j < m) {
if (needle[j] == haystack[i]) {
i++;
j++;
} else break;
}
if (j == m) count++;
else if (j == 0) i++;
j = F[j];
}
return count;
}
}
public static void main(String[] args) {
System.out.println(new KMP(new int[]{1, 2}).countAt(new int[]{1, 2, 3, 4, 1, 2, 1, 3}));
System.out.println(new KMP(new int[]{1, 1}).countAt(new int[]{1, 1, 1}));
}
}
Rather than posting a solution I'll provide some hints to get your moving. 而不是发布解决方案,我会提供一些提示,让你感动。
It's worth breaking the problem down into smaller pieces, in general your algorithm should look like: 值得将问题分解为更小的部分,通常您的算法应该如下所示:
for each position in the big array
check if the small array matches that position
if it does, increment your counter
The smaller piece is then checking if the small array matches a given position 然后较小的部分检查小阵列是否匹配给定位置
first check if there's enough room to fit the smaller array
if not then the arrays don't match
otherwise for each position in the smaller array
check if the values in the arrays match
if not then the arrays don't match
if you get to the end of the smaller array and they have all matched
then the arrays match
Though not thoroughly tested I believe this is a solution to your problem. 虽然没有经过全面测试,但我相信这是解决您问题的方法。 I would highly recommend using Sprinters pseudocode to try and figure this out yourself before using this.
我强烈建议使用Sprinters伪代码在使用之前尝试自己解决这个问题。
public static void main(String[] args)
{
int[] smallArray = {1,1};
int[] bigArray = {1,1,1};
int sum = 0;
for(int i = 0; i < bigArray.length; i++)
{
boolean flag = true;
if(bigArray[i] == smallArray[0])
{
for(int x = 0; x < smallArray.length; x++)
{
if(i + x >= bigArray.length)
flag = false;
else if(bigArray[i + x] != smallArray[x])
flag = false;
}
if(flag)
sum += 1;
}
}
System.out.println(sum);
}
} }
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.