简体   繁体   中英

Pattern Match in List of Strings, Create New Column in pandas

I have a pandas dataframe with the following general format:

id,product_name_extract
1,00012CDN
2,14311121NDC
3,NDC37ba
4,47CD27

I also have a list of product codes I would like to match (unfortunately, I have to do NLP extraction, so it will not be a clean match) and then create a new column with the matching list value:

product_name = ['12CDN','21NDC','37ba','7CD2']

id,product_name_extract,product_name_mapped
1,00012CDN,12CDN
2,14311121NDC,21NDC
3,NDC37ba,37ba
4,47CD27,7CD2

I am not too worried about there being collisions.

This would be easy enough if I just needed a True/False indicator using contains and the list values concatenated together with "|" for alternation, but I am a bit stumped now on how I would create a column value of the exact match. Any tips or trick appreciated!

Since you're not worried about collisions, you can join your product_name list with the | operator, and use that as a regex:

df['product_name_mapped'] = (df.product_name_extract.str
                             .findall('|'.join(product_name))
                             .str[0])

Result:

>>> df
   id product_name_extract product_name_mapped
0   1             00012CDN               12CDN
1   2          14311121NDC               21NDC
2   3              NDC37ba                37ba
3   4               47CD27                7CD2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM