I am trying to see if any word from colA is contained in colB in a python dataframe.
example data
ColA ColB Match
this is some text some text TRUE
some more text more TRUE
another line text nothing to see FALSE
my final line dog cats goats FALSE
desc split string, emp split string if any word in emp = any word in desc then true else false
something like...
df['Match'] = df['colA'].str.split().apply(lambda x: 'true' if any x in df['ColB'].str.split() else 'false')
thx
您可以在整个行上使用apply,如下所示:
df.apply(lambda x: np.any([word in x.ColB.split(' ') for word in x.ColA.split(' ')]),axis = 1)
Maybe using issubset
[set(y).issubset(set(x)) for x , y in zip(df.ColA.str.split(),df.ColB.str.split())]
Out[57]: [True, True, False, False]
If we need only on match
[len(list(set(x) & set(y)))>0 for x , y in zip(df.ColA.str.split(),df.ColB.str.split())]
Out[61]: [True, True, False, False]
You can use a list comprehension with zip
and a custom function:
def find_words(words, val):
val_split = val.split()
return any(x in val_split for x in words.split())
df['Match'] = [find_words(a, b) for a, b in zip(df['ColA'], df['ColB'])]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.