I have two lists as
listA = ['123', '345', '678']
listB = ['ABC123', 'CDE455', 'GHK678', 'CGH345']
I want to find the position of listB that matched with each element in listA. For example, the expected output is
0 3 2
where 123
appears in the fist element of listB so result returns 0, 345
appears in fourth postion of listB so it is 3. Note that the number of element in two list is very huge (about 500K elements) so the for loop
is too slow. Have you suggest any faster solution? This is my solution
for i in range (len(listA)):
for j in range (len(listB)):
if listA[i] in listB[j]:
print ('Postion ', j)
You can try like this. We know finding something in dictionary is fastest so the solution should use dictionary for the task completion.
In [1]: import re
In [2]: listA = ['123', '345', '678']
In [3]: listB = ['ABC123', 'CDE455', 'GHK678', 'CGH345']
In [4]: # Mapping b/w number in listB to related index
In [5]: mapping = {re.sub(r'\D+', '', value).strip(): index for index, value in enumerate(listB)}
In [6]: mapping # Print mapping dictionary
Out[6]: {'123': 0, '455': 1, '678': 2, '345': 3}
In [7]: # Find the desired output
In [8]: output = [mapping.get(item) for item in listA]
In [9]: output
Out[9]: [0, 3, 2]
In [10]:
Attached screenshot »
It essentially depends on your dataset. If you're given a sufficiently large enough dataset that you require low complexity, I'd suggest looking into the aho corasick algorithm . The gist of it is that you'd preprocess listA
such that it becomes a trie whose nodes contain a failure link to the longest suffix of the current node in the trie. Because of this, you may simply iterate across each character in each word of listB
and follow the trie you created from preprocessing. Thus your complexity adds the processing time of listA
rather than it becoming multiplicative.
As a side note this doesn't decrease complexity in the case of a dynamic listA
Try adding all the elements in the list to a set()
and searching it. It's supposed to have a much faster in
test.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.