[英]Get all the regex from a list of regex patterns that matches a given string
I have a list of regex patterns that i want to match against a string.how can i find all regex patterns that matches with the string.Tried to take regex patterns one by one using for loop and matched with string.It is causing huge time delay while processing.我有一个要与字符串匹配的正则表达式模式列表。如何找到与字符串匹配的所有正则表达式模式。尝试使用 for 循环逐个采用正则表达式模式并与字符串匹配。这会造成大量时间处理时延迟。
import pandas as pd
fruits=["fruitmangosweet","fruitmangosour","vegtomatosweet","potato"]
df=pd.DataFrame(columns={"item","regex"})
df["item"]=fruits
#dataframe in hand
item regex
0 fruitmangosweet NaN
1 fruitmangosour NaN
2 vegtomatosweet NaN
3 potato NaN
rgx_patterns=[".*sweet.*",".*fruit.*",".*veg.*"]
#desired output
item regex
0 fruitmangosweet .*sweet.*,.*fruit.*
1 fruitmangosour .*fruit.*
2 vegtomatosweet .*sweet.*,.*veg.*
3 potato NaN
#Tried
import re
for itm in df.item.values:
conct_pattern=",".join(rgx for rgx in rgx_patterns if re.match(rgx,itm))
df.loc[df['item']==itm,'regex']=conct_pattern
print(df)
#result
item regex
0 fruitmangosweet .*sweet.*,.*fruit.*
1 fruitmangosour .*fruit.*
2 vegtomatosweet .*sweet.*,.*veg.*
3 potato
#i am getting proper results with above code,but my code is taking too much time to process .My objective is to optimize the code.
Created an array of pre-compiled objects.This avoids compiling same pattern every time while in the loop and performing match().创建了一个预编译对象的数组。这避免了每次在循环中编译相同的模式并执行 match()。
rgx_patterns=[".*sweet.*",".*fruit.*",".*veg.*"]
#array of pre compiled objects.
rgx_obj=[re.compile(i) for i in rgx_patterns]
def rgx_finder(itm):
cn_patterns=",".join({rgx.pattern for rgx in rgx_obj if rgx.match(itm)})
return cn_patterns
df['regex']=df['item'].apply(rgx_finder)
print(df)
regex item
0 .*fruit.*,.*sweet.* fruitmangosweet
1 .*fruit.* fruitmangosour
2 .*veg.*,.*sweet.* vegtomatosweet
3 potato
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.