[英]Find matching substrings in two lists
I have two lists: A and B. List lengths are not the same and they both contain strings. 我有两个列表:A和B。列表长度不同,并且都包含字符串。 What is the best way to match substrings in both the lists?
在两个列表中匹配子字符串的最佳方法是什么?
list_A = ['hello','there','you','are']
list_B = ['say_hellaa','therefore','foursquare']
I would like a list of matching substrings called list_C which contains: 我想要一个名为list_C的匹配子字符串的列表,其中包含:
list_C = ['hell','there','are']
I came across this answer, but it requires me to have a list of matching substrings. 我遇到了这个答案,但它要求我有一个匹配子字符串的列表。 Is there a way I can get what I want without manually creating a list of matching substrings?
有没有一种方法可以无需手动创建匹配子字符串列表就可以得到想要的?
This also does not help me cause the second list contains substrings. 这也无济于事,因为第二个列表包含子字符串。
Since you tag pandas
solution from str.contains
由于您从
str.contains
标记了pandas
解决方案
#S_A=pd.Series(list_A)
#S_B=pd.Series(list_B)
S_B[S_B.apply(lambda x : S_A.str.contains(x)).any(1)]
Out[441]:
0 hell
2 here
dtype: object
This is one approach. 这是一种方法。 Using a
list comprehension
. 使用
list comprehension
。
list_A = ['hello','there','you','are']
list_B = ['hell','is','here']
jVal = "|".join(list_A) # hello|there|you|are
print([i for i in list_B if i in jVal ])
Output: 输出:
['hell', 'here']
IIUC: I'd use Numpy IIUC:我会用Numpy
import numpy as np
from numpy.core.defchararray import find
a = np.array(['hello', 'there', 'you', 'are', 'up', 'date'])
b = np.array(['hell', 'is', 'here', 'update'])
bina = b[np.where(find(a[:, None], b) > -1)[1]]
ainb = a[np.where(find(b, a[:, None]) > -1)[0]]
np.append(bina, ainb)
array(['hell', 'here', 'up', 'date'], dtype='<U6')
list_A = ['hello','there','you','are']
list_B = ['hell','is','here']
List_C = []
for a in list_A:
for b in list_B:
print(a,"<->",b)
if a in b:
List_C.append(a)
if b in a:
List_C.append(b)
print(List_C)
For funsies, here's an answer that uses regex! 对于娱乐,这是使用正则表达式的答案!
import re
matches = []
for pat in list_B:
matches.append(re.search(pat, ' '.join(list_A)))
matches = [mat.group() for mat in matches if mat]
print(matches)
# ['hell', 'here']
This returns a match object for each match that is found, the actual string of which is found by match.group()
. 这将为找到的每个匹配项返回一个匹配对象,该对象的实际字符串由
match.group()
找到。 Note that if no match is found (as is the case for the second element in your list_B
), you get a None
in matches
, thus the need to add the if mat
at the end of the list comprehension. 需要注意的是,如果没有发现匹配(如在你的第二个元素的情况下
list_B
),你会得到一个None
在matches
,因此需要添加的if mat
在列表理解的结束。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.