简体   繁体   English

字符串列表中的模式匹配,在pandas中创建新列

[英]Pattern Match in List of Strings, Create New Column in pandas

I have a pandas dataframe with the following general format: 我有一个具有以下常规格式的熊猫数据框:

id,product_name_extract
1,00012CDN
2,14311121NDC
3,NDC37ba
4,47CD27

I also have a list of product codes I would like to match (unfortunately, I have to do NLP extraction, so it will not be a clean match) and then create a new column with the matching list value: 我还有一个要匹配的产品代码列表(不幸的是,我必须进行NLP提取,因此这不是一个干净的匹配),然后使用匹配的列表值创建一个新列:

product_name = ['12CDN','21NDC','37ba','7CD2']

id,product_name_extract,product_name_mapped
1,00012CDN,12CDN
2,14311121NDC,21NDC
3,NDC37ba,37ba
4,47CD27,7CD2

I am not too worried about there being collisions. 我不太担心会发生碰撞。

This would be easy enough if I just needed a True/False indicator using contains and the list values concatenated together with "|" 如果我只需要使用包含和列表值与“ |”串联的True / False指示符,这将很容易 for alternation, but I am a bit stumped now on how I would create a column value of the exact match. 进行交替,但现在我对如何创建完全匹配的列值有些困惑。 Any tips or trick appreciated! 任何技巧或窍门表示赞赏!

Since you're not worried about collisions, you can join your product_name list with the | 由于您不必担心冲突,因此可以将您的product_name列表与|一起加入| operator, and use that as a regex: 运算符,并将其用作正则表达式:

df['product_name_mapped'] = (df.product_name_extract.str
                             .findall('|'.join(product_name))
                             .str[0])

Result: 结果:

>>> df
   id product_name_extract product_name_mapped
0   1             00012CDN               12CDN
1   2          14311121NDC               21NDC
2   3              NDC37ba                37ba
3   4               47CD27                7CD2

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 熊猫,在字符串列表和 df 列值(作为列表)之间找到匹配(任何)以创建新列? - Pandas, finding match(any) between list of strings and df column values(as list) to create new column? pandas 列中的字符串列表与 RegEx 匹配 - Match list of strings in pandas column with RegEx 如果字符串与 pandas 中的模式匹配,则删除列中字符串的最后一部分 - Remove last part of string in column if strings match pattern in pandas 如何将字符串列表添加到 Pandas 中的新列? - How to add a list of strings to a new column in Pandas? Pandas dataframe-创建新的列表列,其中包含来自分组列的字符串聚合 - Pandas dataframe- create new list column consisting of aggregation of strings from grouped column 遍历 Pandas 数据框中的行并匹配列表中的元组并创建一个新的 df 列 - Iterate through rows in pandas dataframe and match tuples from a list and create a new df column 匹配列表和DF列中的字符串并放入新的DF列中 - match strings in list and DF column and put into new DF column Python/Pandas:如何将字符串列表与 DataFrame 列匹配 - Python/Pandas: How to Match List of Strings with a DataFrame column 如何从 pandas 列执行搜索和匹配以创建新列? - How to perform search and match from a pandas column to create a new column? 如何根据 pandas 中现有列的字符串匹配创建新列 - How to create a new column based on string match of existing column in pandas
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM