[英]Extract all strings from a line excluding multiple regex patterns matches
[英]Get all the regex from a list of regex patterns that matches a given string
我有一個要與字符串匹配的正則表達式模式列表。如何找到與字符串匹配的所有正則表達式模式。嘗試使用 for 循環逐個采用正則表達式模式並與字符串匹配。這會造成大量時間處理時延遲。
import pandas as pd
fruits=["fruitmangosweet","fruitmangosour","vegtomatosweet","potato"]
df=pd.DataFrame(columns={"item","regex"})
df["item"]=fruits
#dataframe in hand
item regex
0 fruitmangosweet NaN
1 fruitmangosour NaN
2 vegtomatosweet NaN
3 potato NaN
rgx_patterns=[".*sweet.*",".*fruit.*",".*veg.*"]
#desired output
item regex
0 fruitmangosweet .*sweet.*,.*fruit.*
1 fruitmangosour .*fruit.*
2 vegtomatosweet .*sweet.*,.*veg.*
3 potato NaN
#Tried
import re
for itm in df.item.values:
conct_pattern=",".join(rgx for rgx in rgx_patterns if re.match(rgx,itm))
df.loc[df['item']==itm,'regex']=conct_pattern
print(df)
#result
item regex
0 fruitmangosweet .*sweet.*,.*fruit.*
1 fruitmangosour .*fruit.*
2 vegtomatosweet .*sweet.*,.*veg.*
3 potato
#i am getting proper results with above code,but my code is taking too much time to process .My objective is to optimize the code.
創建了一個預編譯對象的數組。這避免了每次在循環中編譯相同的模式並執行 match()。
rgx_patterns=[".*sweet.*",".*fruit.*",".*veg.*"]
#array of pre compiled objects.
rgx_obj=[re.compile(i) for i in rgx_patterns]
def rgx_finder(itm):
cn_patterns=",".join({rgx.pattern for rgx in rgx_obj if rgx.match(itm)})
return cn_patterns
df['regex']=df['item'].apply(rgx_finder)
print(df)
regex item
0 .*fruit.*,.*sweet.* fruitmangosweet
1 .*fruit.* fruitmangosour
2 .*veg.*,.*sweet.* vegtomatosweet
3 potato
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.