简体   繁体   中英

Get all the regex from a list of regex patterns that matches a given string

I have a list of regex patterns that i want to match against a string.how can i find all regex patterns that matches with the string.Tried to take regex patterns one by one using for loop and matched with string.It is causing huge time delay while processing.

import pandas as pd
fruits=["fruitmangosweet","fruitmangosour","vegtomatosweet","potato"]
df=pd.DataFrame(columns={"item","regex"})
df["item"]=fruits
#dataframe in hand
              item regex
0  fruitmangosweet   NaN
1   fruitmangosour   NaN
2   vegtomatosweet   NaN
3           potato   NaN

rgx_patterns=[".*sweet.*",".*fruit.*",".*veg.*"]

#desired output
              item regex
0  fruitmangosweet   .*sweet.*,.*fruit.*
1   fruitmangosour   .*fruit.*
2   vegtomatosweet   .*sweet.*,.*veg.*
3           potato   NaN

#Tried
import re
for itm in df.item.values:
    conct_pattern=",".join(rgx for rgx in rgx_patterns if re.match(rgx,itm))
    df.loc[df['item']==itm,'regex']=conct_pattern
print(df)

#result
 item                regex
0  fruitmangosweet  .*sweet.*,.*fruit.*
1   fruitmangosour            .*fruit.*
2   vegtomatosweet    .*sweet.*,.*veg.*
3           potato                     

#i am getting proper results with above code,but my code is taking too much time to process .My objective is to optimize the code.

Created an array of pre-compiled objects.This avoids compiling same pattern every time while in the loop and performing match().

rgx_patterns=[".*sweet.*",".*fruit.*",".*veg.*"]
#array of pre compiled objects.
rgx_obj=[re.compile(i) for i in rgx_patterns]
def rgx_finder(itm):
    cn_patterns=",".join({rgx.pattern for rgx in rgx_obj if rgx.match(itm)})
    return cn_patterns

df['regex']=df['item'].apply(rgx_finder)
print(df)

               regex             item
0  .*fruit.*,.*sweet.*  fruitmangosweet
1            .*fruit.*   fruitmangosour
2    .*veg.*,.*sweet.*   vegtomatosweet
3                                potato

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM