简体   繁体   中英

Find matching substrings in two lists

I have two lists: A and B. List lengths are not the same and they both contain strings. What is the best way to match substrings in both the lists?

list_A = ['hello','there','you','are']
list_B = ['say_hellaa','therefore','foursquare']

I would like a list of matching substrings called list_C which contains:

list_C = ['hell','there','are']

I came across this answer, but it requires me to have a list of matching substrings. Is there a way I can get what I want without manually creating a list of matching substrings?

This also does not help me cause the second list contains substrings.

Since you tag pandas solution from str.contains

#S_A=pd.Series(list_A)
#S_B=pd.Series(list_B)

S_B[S_B.apply(lambda x : S_A.str.contains(x)).any(1)]
Out[441]: 
0    hell
2    here
dtype: object

This is one approach. Using a list comprehension .

list_A = ['hello','there','you','are']
list_B = ['hell','is','here']
jVal = "|".join(list_A)        # hello|there|you|are

print([i for i in list_B if i in jVal ])

Output:

['hell', 'here']

IIUC: I'd use Numpy

import numpy as np
from numpy.core.defchararray import find

a = np.array(['hello', 'there', 'you', 'are', 'up', 'date'])
b = np.array(['hell', 'is', 'here', 'update'])

bina = b[np.where(find(a[:, None], b) > -1)[1]]
ainb = a[np.where(find(b, a[:, None]) > -1)[0]]

np.append(bina, ainb)

array(['hell', 'here', 'up', 'date'], dtype='<U6')
list_A = ['hello','there','you','are']
list_B = ['hell','is','here']
List_C = []

for a in list_A:
    for b in list_B:
        print(a,"<->",b)
        if a in b:
            List_C.append(a)
        if b in a:
            List_C.append(b)

print(List_C)

For funsies, here's an answer that uses regex!

import re

matches = []
for pat in list_B:
    matches.append(re.search(pat, ' '.join(list_A)))
matches = [mat.group() for mat in matches if mat]
print(matches)
# ['hell', 'here']

This returns a match object for each match that is found, the actual string of which is found by match.group() . Note that if no match is found (as is the case for the second element in your list_B ), you get a None in matches , thus the need to add the if mat at the end of the list comprehension.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM