I am trying to solve a specific variant of the problem mentioned here:
Given a string s and a string t, check if s is subsequence of t.
I wrote an algorithm that works fine for the above question:
def isSubsequence(s, t):
"""
:type s: str
:type t: str
:rtype: bool
"""
i = 0
for x in t:
if i<len(s) and x==s[i]:
i = i + 1
return i==len(s)
Now there is a particular use case:
If there are lots of incoming S, say S1, S2, ... , Sk where k >= 1 Billion, and you want to check one by one to see if T has its subsequence.
There is a hint:
/**
* If we check each sk in this way, then it would be O(kn) time where k is the number of s and t is the length of t.
* This is inefficient.
* Since there is a lot of s, it would be reasonable to preprocess t to generate something that is easy to search for if a character of s is in t.
* Sounds like a HashMap, which is super suitable for search for existing stuff.
*/
But the logic seems like inverting the logic of the algorithm above algorithm, if s is traversed and the character is searched in t using hashmap, it will not be always correct as a hashmap of t will have only 1 index for that character and there is no guarantee that the order will be preserved.
So, I am stuck at how to optimize the algorithm for the above use case?
Thanks for your help.
For each i
less than len(t)
, and each character c
that occurs in t
, make a mapping from (i,c)->j
, where j
is the first index >= i
that at which c
occurs.
Then you can iterate through each Sk, using the map to find the next occurrence of each required character, if it exists.
This is essentially making a deterministic finite automaton that matches subsequences of t
( https://en.wikipedia.org/wiki/Deterministic_finite_automaton ).
You can preprocess t
to create a list of all possible subsequences (keep in mind that t
will have 2^len(t)-1
subsequences). You can turn this into a hashtable and then iterate over your list of s
, checking for each s
in the table. the advantage is you don't have to iterate over t
for each s
.
By the way, if you get stuck on preprocessing t
for a list of all subsequences, you should look into powerset
and its implementation in python.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.