[英]Python regex: Extract volume (mL) from strings
我有以下字符串來提取體積(僅匹配 ml,不匹配 mg/ml)
test = [
"10ML", # 10
"10 ML", # 10
"10.5ML", # 10.5
"1MG/1ML", # [] not match
"1MG/10ML", # [] not match
"10MG/0.5ML", # [] not match
" 10ML and 15ML ", # 10, 15
"LODEXA (DEXAMETHASONE) 5 MG/ML INJECTION", # []
"NSS.0.9% 1000 ML (PLASTIC BAG)", # 1000
"110 MLM HIDRASEC (RACECADOTIL)10 ML POWDER FOR 1-30 ML", # 10
]
這是我目前的模式和結果。
pattern = re.compile("(?<!\/)([0-9]*[.]*[0-9]+)\s*ML(?![\/A-z])")
for i, s in enumerate(test):
print(test[i], '>>' , pattern.findall(s))
10ML >> ['10']
10 ML >> ['10']
10.5ML >> ['10.5']
1MG/1ML >> []
1MG/10ML >> ['0'] # Wrong []
10MG/0.5ML >> ['.5'] # Wrong []
10ML and 15ML >> ['10', '15']
LODEXA (DEXAMETHASONE) 5 MG/ML INJECTION >> []
NSS.0.9% 1000 ML (PLASTIC BAG) >> ['1000']
110 MLM HIDRASEC (RACECADOTIL)10 ML POWDER FOR 1-30 ML >> ['10', '30'] # Wrong ['10']
如您所見,我從["1MG/10ML", "10MG/0.5ML", "110 MLM HIDRASEC (RACECADOTIL)10 ML POWDER FOR 1-30 ML"]
得到了錯誤的結果。 它應該是[[], [], ['10']
。
我試圖修復我的模式,但仍然無法弄清楚。 請幫我糾正我的模式。 謝謝!
有關以下正則表達式組件的詳細信息,請參閱此 RegExr鏈接。
import re
test = [
"10ML", # 10
"10 ML", # 10
"10.5ML", # 10.5
"1MG/1ML", # [] not match
"1MG/10ML", # [] not match
"10MG/0.5ML", # [] not match
" 10ML and 15ML ", # 10, 15
"LODEXA (DEXAMETHASONE) 5 MG/ML INJECTION", # []
"NSS.0.9% 1000 ML (PLASTIC BAG)", # 1000
"110 MLM HIDRASEC (RACECADOTIL)10 ML POWDER FOR 1-30 ML", # 10
]
for s in test:
re.findall(r'(?<![\-\/])(\d+(?:\.?\d+)) *ML\b', s)
Output
['10']
['10']
['10.5']
[]
[]
[]
['10', '15']
[]
['1000']
['10']
您可以使用
(?<![/\d])(?<!\d[.-])(\d+(?:\.\d+)?)\s*ML\b(?!/)
請參閱Python 正則表達式演示。
詳情:
(?<![/\d])
- 當前位置左側不允許有/
或數字(?<.\d[.-])
- 沒有數字 + .
或-
允許立即位於當前位置的左側(\d+(?:\.\d+)?)
- 第 1 組:一個或多個數字,以及可選的 a 序列.
和一位或多位數字\s*
- 零個或多個空白字符ML\b
- ML
作為一個整體(?!/)
- 當前位置右側不允許/
不允許。請參閱Python 演示:
import re
pattern = re.compile(r'(?<![/\d])(?<!\d[.-])(\d+(?:\.\d+)?)\s*ML\b(?!/)', re.A)
test = ["10ML", "10 ML", "10.5ML", "1MG/1ML", "1MG/10ML", "10MG/0.5ML", " 10ML and 15ML ",
"LODEXA (DEXAMETHASONE) 5 MG/ML INJECTION", "NSS.0.9% 1000 ML (PLASTIC BAG)",
"110 MLM HIDRASEC (RACECADOTIL)10 ML POWDER FOR 1-30 ML"]
for i, s in enumerate(test):
print(test[i], '>>' , pattern.findall(s))
Output:
10ML >> ['10']
10 ML >> ['10']
10.5ML >> ['10.5']
1MG/1ML >> []
1MG/10ML >> []
10MG/0.5ML >> []
10ML and 15ML >> ['10', '15']
LODEXA (DEXAMETHASONE) 5 MG/ML INJECTION >> []
NSS.0.9% 1000 ML (PLASTIC BAG) >> ['1000']
110 MLM HIDRASEC (RACECADOTIL)10 ML POWDER FOR 1-30 ML >> ['10']
另一個可能更容易閱讀:
(?<![/\d-])(\d+\.*\d+)\s*ML\b
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.