簡體   English   中英

從匹配給定字符串的正則表達式模式列表中獲取所有正則表達式

[英]Get all the regex from a list of regex patterns that matches a given string

我有一個要與字符串匹配的正則表達式模式列表。如何找到與字符串匹配的所有正則表達式模式。嘗試使用 for 循環逐個采用正則表達式模式並與字符串匹配。這會造成大量時間處理時延遲。

import pandas as pd
fruits=["fruitmangosweet","fruitmangosour","vegtomatosweet","potato"]
df=pd.DataFrame(columns={"item","regex"})
df["item"]=fruits
#dataframe in hand
              item regex
0  fruitmangosweet   NaN
1   fruitmangosour   NaN
2   vegtomatosweet   NaN
3           potato   NaN

rgx_patterns=[".*sweet.*",".*fruit.*",".*veg.*"]

#desired output
              item regex
0  fruitmangosweet   .*sweet.*,.*fruit.*
1   fruitmangosour   .*fruit.*
2   vegtomatosweet   .*sweet.*,.*veg.*
3           potato   NaN

#Tried
import re
for itm in df.item.values:
    conct_pattern=",".join(rgx for rgx in rgx_patterns if re.match(rgx,itm))
    df.loc[df['item']==itm,'regex']=conct_pattern
print(df)

#result
 item                regex
0  fruitmangosweet  .*sweet.*,.*fruit.*
1   fruitmangosour            .*fruit.*
2   vegtomatosweet    .*sweet.*,.*veg.*
3           potato                     

#i am getting proper results with above code,but my code is taking too much time to process .My objective is to optimize the code.

創建了一個預編譯對象的數組。這避免了每次在循環中編譯相同的模式並執行 match()。

rgx_patterns=[".*sweet.*",".*fruit.*",".*veg.*"]
#array of pre compiled objects.
rgx_obj=[re.compile(i) for i in rgx_patterns]
def rgx_finder(itm):
    cn_patterns=",".join({rgx.pattern for rgx in rgx_obj if rgx.match(itm)})
    return cn_patterns

df['regex']=df['item'].apply(rgx_finder)
print(df)

               regex             item
0  .*fruit.*,.*sweet.*  fruitmangosweet
1            .*fruit.*   fruitmangosour
2    .*veg.*,.*sweet.*   vegtomatosweet
3                                potato

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM