字符串列表中的模式匹配，在pandas中创建新列

Question

I have a pandas dataframe with the following general format: 我有一个具有以下常规格式的熊猫数据框：

id,product_name_extract
1,00012CDN
2,14311121NDC
3,NDC37ba
4,47CD27

I also have a list of product codes I would like to match (unfortunately, I have to do NLP extraction, so it will not be a clean match) and then create a new column with the matching list value: 我还有一个要匹配的产品代码列表（不幸的是，我必须进行NLP提取，因此这不是一个干净的匹配），然后使用匹配的列表值创建一个新列：

product_name = ['12CDN','21NDC','37ba','7CD2']

id,product_name_extract,product_name_mapped
1,00012CDN,12CDN
2,14311121NDC,21NDC
3,NDC37ba,37ba
4,47CD27,7CD2

I am not too worried about there being collisions. 我不太担心会发生碰撞。

This would be easy enough if I just needed a True/False indicator using contains and the list values concatenated together with "|" 如果我只需要使用包含和列表值与“ |”串联的True / False指示符，这将很容易 for alternation, but I am a bit stumped now on how I would create a column value of the exact match. 进行交替，但现在我对如何创建完全匹配的列值有些困惑。 Any tips or trick appreciated! 任何技巧或窍门表示赞赏！

Answer 1

Since you're not worried about collisions, you can join your product_name list with the | 由于您不必担心冲突，因此可以将您的product_name列表与|一起加入| operator, and use that as a regex: 运算符，并将其用作正则表达式：

df['product_name_mapped'] = (df.product_name_extract.str
                             .findall('|'.join(product_name))
                             .str[0])

Result: 结果：

>>> df
   id product_name_extract product_name_mapped
0   1             00012CDN               12CDN
1   2          14311121NDC               21NDC
2   3              NDC37ba                37ba
3   4               47CD27                7CD2

字符串列表中的模式匹配，在pandas中创建新列

问题描述

1 个解决方案

解决方案1
4 已采纳 2018-08-28 20:54:08

字符串列表中的模式匹配，在pandas中创建新列

问题描述

1 个解决方案

解决方案1 4 已采纳 2018-08-28 20:54:08

解决方案1
4 已采纳 2018-08-28 20:54:08