简体   繁体   中英

Finding a minimum length substring in string S which contains all characters from string T using Hash Table in O(n)

I know that this question has already been asked more than once.My doubt is not finding the solution of this problem, but the correct complexity. I read an answer here:

Minimum window width in string x that contains all characters in string y
In this solution,the complexity proposed is O(n) using hash table.Now till mapping the characters of string T in hash table is fine,but when it comes to finding the minimum and maximum element from the hash table,in almost every answer,I read a linked list is used and they write that the complexity in updating a linked list is O(1).This is where the doubt is.

Please consider the following example:

S:abcbae

T:abc

Initially t["a"],t["b"] and t["c"] will be -1,after 3 passes the values will be 0,1,2 respectively(in hash table) and the double linked list will be containing the elements [(a,0)-(b,1)-(c,2)] .

Now the fourth character in string S is b ,so before appending new value of index for b in DLL,I need to remove it's previous instance for which I will have to traverse the DLL.

My question is then how come is the solution for updating DLL is O(1),isn't it O(n),which would lead to a total complexity of O(n^2)?

[O(n) for traversing string S,then O(n) for updating DLL]

Please correct me if I am wrong.

The data structure described is a doubly-linked list, visualised as follows, using the example in the answer :

HEAD <=> ('c', 0) <=> ('b', 3) <=> ('a', 5) <=> TAIL

coupled with an array, which is visualised as follows:

      'a'      |      'b'      |      'c'
 {5, &list[2]} | {3, &list[1]} | {0, &list[0]}

(notice the duplication of information)

At any point during the described algorithm, the following operations are necessary:

  1. insertion of an element in the array
  2. insertion of an element in the list
  3. finding the index value for a given character
  4. removal of a node from the list

The time-complexity of #1 is O(1) (trivial), and of #2 also O(1) because of our choice of doubly-linked list.

The time-complexity of #3 is also O(1), because of our choice of array. It would still be O(1) for a hash-table-backed map.

The time-complexity of #4 is O(1) because our array holds a pointer to each node in the list, and each character only ever has one node in the list, therefore removal is also trivial.

It's important to note (and understand) here that at any iteration of the proposed algorithm, the minimum window cannot be made larger than what it already was in the last step (why? prove it), and therefore we only need at most one node per character in the string Y (or T in your question).

Hope this helps.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM