简体   繁体   English

查找包含查询数组所有元素的输入数组的最小窗口

[英]Find the smallest window of input array that contains all the elements of query array

Problem: Given an input array of integers of size n, and a query array of integers of size k, find the smallest window of input array that contains all the elements of query array and also in the same order. 问题:给定一个大小为n的整数的输入数组和大小为k的整数的查询数组,找到包含查询数组的所有元素且顺序相同的最小输入数组窗口。

I have tried below approach. 我已经尝试了以下方法。

        int[] inputArray = new int[] { 2, 5, 2, 8, 0, 1, 4, 7 };
        int[] queryArray = new int[] { 2, 1, 7 };

Will find the position of all query array element in inputArray. 将在inputArray中找到所有查询数组元素的位置。

public static void SmallestWindow(int[] inputArray, int[] queryArray)
    {
        Dictionary<int, HashSet<int>> dict = new Dictionary<int, HashSet<int>>();

        int index = 0;
        foreach (int i in queryArray)
        {
            HashSet<int> hash = new HashSet<int>();
            foreach (int j in inputArray)
            {
                index++;
                if (i == j)
                    hash.Add(index); 
            }
            dict.Add(i, hash);
            index = 0;
        }
      // Need to perform action in above dictionary.??
    }

I got following dictionary 我得到了以下字典

  1. int 2--> position {1, 3} int 2->位置{1,3}
  2. int 1 --> position {6} 整数1->位置{6}
  3. int 7 --> position {8} 整数7->位置{8}

Now I want to perform following step to findout minimum window 现在我要执行以下步骤以找出最小窗口

  1. Compare int 2 position to int 1 position. 比较int 2位置和int 1位置。 As (6-3) < (6-1)..So I will store 3, 6 in a hashmap. 由于(6-3)<(6-1)..所以我将3、6存储在哈希图中。

  2. Will compare the position of int 1 and int 7 same like above. 像上面一样比较int 1和int 7的位置。

I cannot understand how I will compare two consecutive value of a dictionary. 我不明白如何比较字典的两个连续值。 Please help. 请帮忙。

The algorithm: 算法:
For each element in the query array, store in a map M (V → (I,P)), V is the element, I is an index into the input array, P is the position in the query array. 对于查询数组中的每个元素,将其存储在映射M(V→(I,P))中,V是元素,I是输入数组的索引,P是查询数组中的位置。 (The index into the input array for some P is the largest such that query[0..P] is a subsequence of input[I..curr]) (对于某些P,输入数组的索引最大,因此query [0..P]是input [I..curr]的子序列)

Iterate through the array. 遍历数组。
If the value is the first term in the query array: Store the current index as I. 如果该值是查询数组中的第一项:将当前索引存储为I。
Else: Store the value of the index of the previous element in the query array, eg M[currVal].I = M[query[M[currVal].P-1]].I . 否则:将前一个元素的索引值存储在查询数组中,例如M[currVal].I = M[query[M[currVal].P-1]].I
If the value is the last term: Check if [I..curr] is a new best. 如果值是最后一项:检查[I..curr]是否是新的最佳值。

Complexity 复杂
The complexity of this is O(N) , where N is the size of the input array. 复杂度为O(N) ,其中N是输入数组的大小。

NB NB
This code expects that no elements are repeated in the query array. 此代码期望查询数组中不重复任何元素。 To cater for this, we can use a map M (V → listOf((I,P))). 为此,我们可以使用映射M(V→listOf((I,P)))。 This is O(N hC(Q)), where hC(Q) is the count of the mode for the query array.. 这是O(N hC(Q)),其中hC(Q)是查询数组的模式计数。
Even better would be to use M (V → listOf((linkedList(I), P))). 更好的方法是使用M(V→listOf(((linkedList(I),P)))。 Where repeated elements occur consecutively in the query array, we use a linked list. 在查询数组中连续出现重复元素的地方,我们使用链表。 Updating those values then becomes O(1). 然后更新这些值将变为O(1)。 The complexity is then O(N hC(D(Q))), where D(Q) is Q with consecutive terms merged. 复杂度为O(N hC(D(Q))),其中D(Q)为Q,合并了连续项。

Implementation 实作
Sample java implementation is available here . 示例Java实现在此处提供 This does not work for repeated elements in the query array, nor do error checking, etc. 这不适用于查询数组中的重复元素,也不适用于错误检查等。

I don't see how using HashSet and Dictionary will help you in this. 我看不到使用HashSetDictionary将如何帮助您。 Were I faced with this problem, I'd go about it quite differently. 如果遇到这个问题,我会以完全不同的方式处理它。

One way to do it (not the most efficient way) is shown below. 一种方法(不是最有效的方法)如下所示。 This code makes the assumption that queryArray contains at least two items. 此代码假定queryArray至少包含两个项目。

int FindInArray(int[] a, int start, int value)
{
    for (int i = start; i < a.Length; ++i)
    {
        if (a[i] == value)
            return i;
    }
    return -1;
}

struct Pair
{
    int first;
    int last;
}

List<Pair> foundPairs = new List<Pair>();

int startPos = 0;
bool found = true;
while (found)
{
    found = false;
    // find next occurrence of queryArray[0] in inputArray
    startPos = FindInArray(inputArray, startPos, queryArray[0]);
    if (startPos == -1)
    {
        // no more occurrences of the first item
        break;
    }
    Pair p = new Pair();
    p.first = startPos;
    ++startPos;
    int nextPos = startPos;
    // now find occurrences of remaining items
    for (int i = 1; i < queryArray.Length; ++i)
    {
        nextPos = FindInArray(inputArray, nextPos, queryArray[i]);
        if (nextPos == -1)
        {
            break;  // didn't find it
        }
        else
        {
            p.last = nextPos++;
            found = (i == queryArray.Length-1);
        }
    }
    if (found)
    {
        foundPairs.Add(p);
    }
}

// At this point, the foundPairs list contains the (start, end) of all
// sublists that contain the items in order.
// You can then iterate through that list, subtract (last-first), and take
// the item that has the smallest value.  That will be the shortest sublist
// that matches the criteria.

With some work, this could be made more efficient. 通过一些工作,可以提高效率。 For example, if 'queryArray' contains [1, 2, 3] and inputArray contains [1, 7, 4, 9, 1, 3, 6, 4, 1, 8, 2, 3] , the above code will find three matches (starting at positions 0, 4, and 8). 例如,如果'queryArray'包含[1, 2, 3]inputArray包含[1, 7, 4, 9, 1, 3, 6, 4, 1, 8, 2, 3] inputArray [1, 7, 4, 9, 1, 3, 6, 4, 1, 8, 2, 3] ,则上面的代码将找到三个匹配(从位置0、4和8开始)。 Slightly smarter code could determine that when the 1 at position 4 is found, since no 2 was found prior to it, that any sequence starting at the first position would be longer than the sequence starting at position 4, and therefore short-circuit the first sequence and start over at the new position. 稍微聪明一点的代码可以确定,当在位置4处找到1时,由于之前没有找到2 ,因此任何从第一个位置开始的序列都将比从位置4开始的序列长,因此将第一个短路排序并从新位置开始。 That complicates the code a bit, though. 但是,这会使代码有些复杂。

You want not a HashSet but a (sorted) tree or array as the value in the dictionary; 您不希望将HashSet用作字典中的值(排序的树)或数组; the dictionary contains mappings from values you find in the input array to the (sorted) list of indices where that value appears. 字典包含从您在输入数组中找到的值到显示该值的索引(已排序)列表的映射。

Then you do the following 然后您执行以下操作

  • Look up the first entry in the query. 查找查询中的第一个条目。 Pick the lowest index where it appears. 选择出现的最低索引。
  • Look up the second entry; 查找第二个条目; pick the lowest entry greater than the index of the first. 选择大于第一个索引的最低条目。
  • Look up the third; 查找第三个; pick the lowest greater than the second. 选择比第二大的最低。 (Etc.) (等等。)
  • When you reach the last entry in the query, (1 + last index - first index) is the size of the smallest match. 当您到达查询中的最后一个条目时,(1 +最后一个索引-第一个索引)是最小匹配项的大小。
  • Now pick the second index of the first query, repeat, etc. 现在,选择第一个查询的第二个索引,重复等等。
  • Pick the smallest match found from any of the starting indices. 从任何起始索引中选择最小的匹配项。

(Note that the "lowest entry greater" is an operation supplied with sorted trees, or can be found via binary search on a sorted array.) (请注意,“最低的条目更大”是有序树提供的操作,或者可以通过对有序数组进行二进制搜索找到。)

The complexity of this is approximately O(M*n*log n) where M is the length of the query and n is the average number of indices at which a given value appears in the input array. 这的复杂度大约为O(M*n*log n) ,其中M是查询的长度, n是给定值出现在输入数组中的平均索引数。 You can modify the strategy by picking that query array value that appears least often for the starting point and going up and down from there; 您可以修改策略,方法是选择最不经常出现在起点的查询数组值,然后在该值上下移动; if there are k of those entries ( k <= n ) then the complexity is O(M*k*log n) . 如果这些条目中有k个( k <= n ),则复杂度为O(M*k*log n)

After you got all the positions(indexes) in the inputArray: 在输入数组中获得所有位置(索引)之后:

2 --> position {0,2}   // note: I change them to 0-based array
1 --> position {5,6}  // I suppose it's {5,6} to make it more complex, in your code it's only {5}
7 --> position {7}

I use a recursion to get all possible paths. 我使用递归来获取所有可能的路径。 [0->5->7] [0->6->7] [2->5->7] [2->6->7]. [0-> 5-> 7] [0-> 6-> 7] [2-> 5-> 7] [2-> 6-> 7]。 The total is 2*2*1=4 possible paths. 总数为2 * 2 * 1 = 4条可能的路径。 Obviously the one who has Min(Last-First) is the shortest path(smallest window), those numbers in the middle of the path don't matter. 显然,具有Min(Last-First)是最短路径(最小窗口),路径中间的那些数字无关紧要。 Here comes the code. 代码来了。

 struct Pair
 {
     public int Number;  // the number in queryArray
     public int[] Indexes;  // the positions of the number
 }
 static List<int[]> results = new List<int[]>(); //store all possible paths
 static Stack<int> currResult = new Stack<int>(); // the container of current path
 static int[] inputArray, queryArray; 
 static Pair[] pairs;

After the data structures, here is the Main . 在数据结构之后,这里是Main

inputArray = new int[] { 2, 7, 1, 5, 2, 8, 0, 1, 4, 7 }; //my test case
queryArray = new int[] { 2, 1, 7 };
pairs = (from n in queryArray
      select new Pair { Number = n, Indexes = inputArray.FindAllIndexes(i => i == n) }).ToArray();
Go(0);

FindAllIndexes is an extension method to help find all the indexes. FindAllIndexes是一种扩展方法,可帮助查找所有索引。

public static int[] FindAllIndexes<T>(this IEnumerable<T> source, Func<T,bool> predicate)
{
     //do necessary check here, then
     Queue<int> indexes = new Queue<int>();
     for (int i = 0;i<source.Count();i++)
           if (predicate(source.ElementAt(i))) indexes.Enqueue(i);
     return indexes.ToArray();
}

The recursion method: 递归方法:

static void Go(int depth)
{
    if (depth == pairs.Length)
    {
        results.Add(currResult.Reverse().ToArray());
    }
    else
    {
        var indexes = pairs[depth].Indexes;
        for (int i = 0; i < indexes.Length; i++)
        {
            if (depth == 0 || indexes[i] > currResult.Last())
            {
                currResult.Push(indexes[i]);
                Go(depth + 1);
                currResult.Pop();
            }
        }
    }
}

At last, a loop of results can find the Min(Last-First) result(shortest window). 最后,一个results循环可以找到Min(Last-First)结果(最短窗口)。

Algorithm: 算法:

  1. get all indexes into the inputArray of all queryArray values 将所有索引获取到所有queryArray值的inputArray中
  2. order them ascending by index 按索引升序排列
  3. using each index (x) as a starting point find the first higher index (y) such that the segment inputArray[xy] contains all queryArray values 使用每个索引(x)作为起点,找到第一个较高的索引(y),以便段inputArray [xy]包含所有queryArray值
  4. keep only those segments that have the queryArray items in order 仅保留顺序排列有queryArray项的那些段
  5. order the segments by their lengths, ascending 按段的长度排序,升序

c# implementation: C#实现:

First get all indexes into the inputArray of all queryArray values and order them ascending by index. 首先,将所有索引放入所有queryArray值的inputArray中,并按索引升序排列。

public static int[] SmallestWindow(int[] inputArray, int[] queryArray)
{
    var indexed = queryArray
        .SelectMany(x => inputArray
                             .Select((y, i) => new
                                 {
                                     Value = y,
                                     Index = i
                                 })
                             .Where(y => y.Value == x))
        .OrderBy(x => x.Index)
        .ToList();

Next, using each index (x) as a starting point find the first higher index (y) such that the segment inputArray[xy] contains all queryArray values. 接下来,使用每个索引(x)作为起点,找到第一个较高的索引(y),以便段inputArray [xy]包含所有queryArray值。

    var segments = indexed
        .Select(x =>
            {
                var unique = new HashSet<int>();
                return new
                    {
                        Item = x,
                        Followers = indexed
                            .Where(y => y.Index >= x.Index)
                            .TakeWhile(y => unique.Count != queryArray.Length)
                            .Select(y =>
                                {
                                    unique.Add(y.Value);
                                    return y;
                                })
                            .ToList(),
                        IsComplete = unique.Count == queryArray.Length
                    };
            })
        .Where(x => x.IsComplete);

Now keep only those segments that have the queryArray items in order. 现在,仅保留顺序排列有queryArray项的那些段。

    var queryIndexed = segments
        .Select(x => x.Followers.Select(y => new
            {
                QIndex = Array.IndexOf(queryArray, y.Value),
                y.Index,
                y.Value
            }).ToArray());

    var queryOrdered = queryIndexed
        .Where(item =>
            {
                var qindex = item.Select(x => x.QIndex).ToList();
                bool changed;
                do
                {
                    changed = false;
                    for (int i = 1; i < qindex.Count; i++)
                    {
                        if (qindex[i] <= qindex[i - 1])
                        {
                            qindex.RemoveAt(i);
                            changed = true;
                        }
                    }
                } while (changed);
                return qindex.Count == queryArray.Length;
            });

Finally, order the segments by their lengths, ascending. 最后,按段的长度升序排列。 The first segment in the result is the smallest window into inputArray that contains all queryArray values in the order of queryArray. 结果中的第一段是inputArray的最小窗口,其中包含按queryArray顺序排列的所有queryArray值。

    var result = queryOrdered
        .Select(x => new[]
            {
                x.First().Index,
                x.Last().Index
            })
        .OrderBy(x => x[1] - x[0]);

    var best = result.FirstOrDefault();
    return best;
}

test it with 用测试

public void Test()
{
    var inputArray = new[] { 2, 1, 5, 6, 8, 1, 8, 6, 2, 9, 2, 9, 1, 2 };
    var queryArray = new[] { 6, 1, 2 };

    var result = SmallestWindow(inputArray, queryArray);

    if (result == null)
    {
        Console.WriteLine("no matching window");
    }
    else
    {
        Console.WriteLine("Smallest window is indexes " + result[0] + " to " + result[1]);
    }
}

output: 输出:

Smallest window is indexes 3 to 8

Thank you everyone for your inputs. 谢谢大家的投入。 I have changed my code a bit and find it working. 我对代码做了一些更改,发现它可以正常工作。 Though it might not be very efficient but I'm happy to solve using my head :). 虽然它可能不是很有效,但是我很乐意解决:)。 Please give your feedback 请提供您的反馈

Here is my Pair class with having number and position as variable 这是我的配对课程,其中的数字和位置为变量

    public class Pair
    {
    public int Number;
    public List<int> Position;
    }

Here is a method which will return the list of all Pairs. 这是一个将返回所有对的列表的方法。

     public static Pair[]  GetIndex(int[] inputArray, int[] query)
      {
        Pair[] pairList = new Pair[query.Length]; 
        int pairIndex = 0;
        foreach (int i in query)
        {
            Pair pair = new Pair();
            int index = 0;
            pair.Position = new List<int>();
            foreach (int j in inputArray)
            {                    
                if (i == j)
                {
                    pair.Position.Add(index);
                }
                index++;
            }
            pair.Number = i;
            pairList[pairIndex] = pair;
            pairIndex++;
        }
        return pairList;
    }

Here is the line of code in Main method 这是Main方法中的代码行

        Pair[] pairs = NewCollection.GetIndex(array, intQuery);

        List<int> minWindow = new List<int>();
        for (int i = 0; i <pairs.Length - 1; i++)
        {
            List<int> first = pairs[i].Position;
            List<int> second = pairs[i + 1].Position;
            int? temp = null;
            int? temp1 = null;
            foreach(int m in first)
            {
                foreach (int n in second)
                {
                    if (n > m)
                    {
                        temp = m;
                        temp1 = n;
                    }                        
                }                    
            }
            if (temp.HasValue && temp1.HasValue)
            {
                if (!minWindow.Contains((int)temp))
                    minWindow.Add((int)temp);
                if (!minWindow.Contains((int)temp1))
                    minWindow.Add((int)temp1);
            }
            else
            {
                Console.WriteLine(" Bad Query array");
                minWindow.Clear();
                break;                    
            }
        }

        if(minWindow.Count > 0)
        {
         Console.WriteLine("Minimum Window is :");
         foreach(int i in minWindow)
         {
             Console.WriteLine(i + " ");
         }
        }

值得注意的是,该问题与最长的公共子序列问题有关,因此在重复情况下,提出一种比O(n ^ 2)更好的算法运行是很困难的。

Just in case someone is interested in C++ implementation with O(nlog(k)) 万一有人对使用O(nlog(k))的C ++实现感兴趣

    void findMinWindow(const vector<int>& input, const vector<int>& query) {
         map<int, int> qtree;
         for(vector<int>::const_iterator itr=query.begin(); itr!=query.end(); itr++) {
            qtree[*itr] = 0;
         }

         int first_ptr=0;
         int begin_ptr=0;

         int index1 = 0;
         int queptr = 0;

         int flip = 0;

         while(true) {
             //check if value is in query
             if(qtree.find(input[index1]) != qtree.end()) {
                int x = qtree[input[index1]];
                if(0 == x) {
                  flip++;
                }
                qtree[input[index1]] = ++x;
              }

              //remove all nodes that are not required and
              //yet satisfy the all query condition.
              while(query.size() == flip) {
                //done nothing more
                if(queptr == input.size()) {
                  break;
                }

                //check if queptr is pointing to node in the query
                if(qtree.find(input[queptr]) != qtree.end()) {
                  int y = qtree[input[queptr]];
                  //more nodes and the queue is pointing to deleteable node
                  //condense the nodes
                  if(y > 1) {
                    qtree[input[queptr]] = --y;
                    queptr++;
                  } else {
                    //cant condense more just keep that memory
                    if((!first_ptr && !begin_ptr) ||
                        ((first_ptr-begin_ptr)>(index1-queptr))) {
                      first_ptr=index1;
                      begin_ptr=queptr;
                    }
                    break;
                  }
                } else {
                  queptr++;
                }
              }

             index1++;

             if(index1==input.size()) {
                break;
             }
         }
         cout<<"["<<begin_ptr<<" - "<<first_ptr<<"]"<<endl;
    }

here the main for calling it. 这是调用它的主要方法。

    #include <iostream>
    #include <vector>
    #include <map>

    using namespace std;

    int main() {
        vector<int> input;
        input.push_back(2);
        input.push_back(5);
        input.push_back(2);
        input.push_back(8);
        input.push_back(0);
        input.push_back(1);
        input.push_back(4);
        input.push_back(7);

        vector<int> query1;
        query1.push_back(2);
        query1.push_back(8);
        query1.push_back(0);

        vector<int> query2;
        query2.push_back(2);
        query2.push_back(1);
        query2.push_back(7);

        vector<int> query3;
        query3.push_back(1);
        query3.push_back(4);

        findMinWindow(input, query1);
        findMinWindow(input, query2);
        findMinWindow(input, query3);
    }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM