I have a list of regex patterns that i want to match against a string.how can i find all regex patterns that matches with the string.Tried to take regex patterns one by one using for loop and matched with string.It is causing huge time delay while processing.
import pandas as pd
fruits=["fruitmangosweet","fruitmangosour","vegtomatosweet","potato"]
df=pd.DataFrame(columns={"item","regex"})
df["item"]=fruits
#dataframe in hand
item regex
0 fruitmangosweet NaN
1 fruitmangosour NaN
2 vegtomatosweet NaN
3 potato NaN
rgx_patterns=[".*sweet.*",".*fruit.*",".*veg.*"]
#desired output
item regex
0 fruitmangosweet .*sweet.*,.*fruit.*
1 fruitmangosour .*fruit.*
2 vegtomatosweet .*sweet.*,.*veg.*
3 potato NaN
#Tried
import re
for itm in df.item.values:
conct_pattern=",".join(rgx for rgx in rgx_patterns if re.match(rgx,itm))
df.loc[df['item']==itm,'regex']=conct_pattern
print(df)
#result
item regex
0 fruitmangosweet .*sweet.*,.*fruit.*
1 fruitmangosour .*fruit.*
2 vegtomatosweet .*sweet.*,.*veg.*
3 potato
#i am getting proper results with above code,but my code is taking too much time to process .My objective is to optimize the code.
Created an array of pre-compiled objects.This avoids compiling same pattern every time while in the loop and performing match().
rgx_patterns=[".*sweet.*",".*fruit.*",".*veg.*"]
#array of pre compiled objects.
rgx_obj=[re.compile(i) for i in rgx_patterns]
def rgx_finder(itm):
cn_patterns=",".join({rgx.pattern for rgx in rgx_obj if rgx.match(itm)})
return cn_patterns
df['regex']=df['item'].apply(rgx_finder)
print(df)
regex item
0 .*fruit.*,.*sweet.* fruitmangosweet
1 .*fruit.* fruitmangosour
2 .*veg.*,.*sweet.* vegtomatosweet
3 potato
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.