I have a list of strings. I need to iterate through rows of my dataframe to try to find if any or more of list items are included in value of one column(string). I'm trying to find substring match between any list item and dataframe column value. Then, I need to assign matched value(s) to a new column or pass NaN if there's no match. Not just any, but all matched parts of string. So, in the third row of my df, these would be both 'E' and 'F22'.
df = pd.DataFrame({'type':['A23 E I28','I28 F A23', 'D41 E F22']})
matches = ['E', 'F22']
Is this what you're looking for?
If there's a match, the keyword is assigned to a new colum
df['new_col'] = df['type'].str.extract(f"({'|'.join(matches)})")
type new_col
0 A23 E I28 E
1 I28 F A23 NaN
2 D41 E F22 E
Edit:
df['new_col'] = (df['type']
.str.findall(f"({'|'.join(matches)})")
.str.join(', ')
.replace('', np.nan))
type new_col
0 A23 E I28 E
1 I28 F A23 NaN
2 D41 E F22 E, F22
I would do it this way:
df["match"] = df.type.map(lambda s: "".join(set(s).intersection(matches)))
df.loc[~df.type.str.contains("|".join(matches)), "match"] = np.nan
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.