从匹配给定字符串的正则表达式模式列表中获取所有正则表达式

Question

I have a list of regex patterns that i want to match against a string.how can i find all regex patterns that matches with the string.Tried to take regex patterns one by one using for loop and matched with string.It is causing huge time delay while processing.我有一个要与字符串匹配的正则表达式模式列表。如何找到与字符串匹配的所有正则表达式模式。尝试使用 for 循环逐个采用正则表达式模式并与字符串匹配。这会造成大量时间处理时延迟。

import pandas as pd
fruits=["fruitmangosweet","fruitmangosour","vegtomatosweet","potato"]
df=pd.DataFrame(columns={"item","regex"})
df["item"]=fruits
#dataframe in hand
              item regex
0  fruitmangosweet   NaN
1   fruitmangosour   NaN
2   vegtomatosweet   NaN
3           potato   NaN

rgx_patterns=[".*sweet.*",".*fruit.*",".*veg.*"]

#desired output
              item regex
0  fruitmangosweet   .*sweet.*,.*fruit.*
1   fruitmangosour   .*fruit.*
2   vegtomatosweet   .*sweet.*,.*veg.*
3           potato   NaN

#Tried
import re
for itm in df.item.values:
    conct_pattern=",".join(rgx for rgx in rgx_patterns if re.match(rgx,itm))
    df.loc[df['item']==itm,'regex']=conct_pattern
print(df)

#result
 item                regex
0  fruitmangosweet  .*sweet.*,.*fruit.*
1   fruitmangosour            .*fruit.*
2   vegtomatosweet    .*sweet.*,.*veg.*
3           potato                     

#i am getting proper results with above code,but my code is taking too much time to process .My objective is to optimize the code.

Answer 1

Created an array of pre-compiled objects.This avoids compiling same pattern every time while in the loop and performing match().创建了一个预编译对象的数组。这避免了每次在循环中编译相同的模式并执行 match()。

rgx_patterns=[".*sweet.*",".*fruit.*",".*veg.*"]
#array of pre compiled objects.
rgx_obj=[re.compile(i) for i in rgx_patterns]
def rgx_finder(itm):
    cn_patterns=",".join({rgx.pattern for rgx in rgx_obj if rgx.match(itm)})
    return cn_patterns

df['regex']=df['item'].apply(rgx_finder)
print(df)

               regex             item
0  .*fruit.*,.*sweet.*  fruitmangosweet
1            .*fruit.*   fruitmangosour
2    .*veg.*,.*sweet.*   vegtomatosweet
3                                potato

从匹配给定字符串的正则表达式模式列表中获取所有正则表达式

问题描述

1 个解决方案

解决方案1
0 2022-01-14 18:31:21

从匹配给定字符串的正则表达式模式列表中获取所有正则表达式

问题描述

1 个解决方案

解决方案1 0 2022-01-14 18:31:21

解决方案1
0 2022-01-14 18:31:21