簡體   English   中英

Python 正則表達式:從字符串中提取體積 (mL)

[英]Python regex: Extract volume (mL) from strings

我有以下字符串來提取體積(僅匹配 ml,不匹配 mg/ml)

test = [
"10ML", # 10
"10 ML", # 10
"10.5ML", # 10.5
"1MG/1ML", # [] not match
"1MG/10ML", # [] not match
"10MG/0.5ML", # [] not match
"   10ML and 15ML  ", # 10, 15
"LODEXA (DEXAMETHASONE) 5 MG/ML INJECTION", # []
"NSS.0.9% 1000 ML (PLASTIC BAG)", # 1000
"110 MLM HIDRASEC (RACECADOTIL)10 ML POWDER FOR 1-30 ML", # 10
]

這是我目前的模式和結果。

pattern = re.compile("(?<!\/)([0-9]*[.]*[0-9]+)\s*ML(?![\/A-z])")

for i, s in enumerate(test):
    print(test[i], '>>' , pattern.findall(s))

10ML >> ['10']
10 ML >> ['10']
10.5ML >> ['10.5']
1MG/1ML >> []
1MG/10ML >> ['0'] # Wrong []
10MG/0.5ML >> ['.5'] # Wrong []
   10ML and 15ML   >> ['10', '15']
LODEXA (DEXAMETHASONE) 5 MG/ML INJECTION >> []
NSS.0.9% 1000 ML (PLASTIC BAG) >> ['1000']
110 MLM HIDRASEC (RACECADOTIL)10 ML POWDER FOR 1-30 ML >> ['10', '30'] # Wrong ['10']

如您所見,我從["1MG/10ML", "10MG/0.5ML", "110 MLM HIDRASEC (RACECADOTIL)10 ML POWDER FOR 1-30 ML"]得到了錯誤的結果。 它應該是[[], [], ['10']

我試圖修復我的模式,但仍然無法弄清楚。 請幫我糾正我的模式。 謝謝!

有關以下正則表達式組件的詳細信息,請參閱此 RegExr鏈接。

import re

test = [
    "10ML", # 10
    "10 ML", # 10
    "10.5ML", # 10.5
    "1MG/1ML", # [] not match
    "1MG/10ML", # [] not match
    "10MG/0.5ML", # [] not match
    "   10ML and 15ML  ", # 10, 15
    "LODEXA (DEXAMETHASONE) 5 MG/ML INJECTION", # []
    "NSS.0.9% 1000 ML (PLASTIC BAG)", # 1000
    "110 MLM HIDRASEC (RACECADOTIL)10 ML POWDER FOR 1-30 ML", # 10
]

for s in test:
    re.findall(r'(?<![\-\/])(\d+(?:\.?\d+)) *ML\b', s)

Output

['10']
['10']
['10.5']
[]
[]
[]
['10', '15']
[]
['1000']
['10']

您可以使用

(?<![/\d])(?<!\d[.-])(\d+(?:\.\d+)?)\s*ML\b(?!/)

請參閱Python 正則表達式演示

詳情

  • (?<![/\d]) - 當前位置左側不允許有/或數字
  • (?<.\d[.-]) - 沒有數字 + . -允許立即位於當前位置的左側
  • (\d+(?:\.\d+)?) - 第 1 組:一個或多個數字,以及可選的 a 序列. 和一位或多位數字
  • \s* - 零個或多個空白字符
  • ML\b - ML作為一個整體
  • (?!/) - 當前位置右側不允許/不允許。

請參閱Python 演示

import re
pattern = re.compile(r'(?<![/\d])(?<!\d[.-])(\d+(?:\.\d+)?)\s*ML\b(?!/)', re.A)
test = ["10ML", "10 ML", "10.5ML", "1MG/1ML", "1MG/10ML", "10MG/0.5ML", "   10ML and 15ML  ",
"LODEXA (DEXAMETHASONE) 5 MG/ML INJECTION", "NSS.0.9% 1000 ML (PLASTIC BAG)", 
"110 MLM HIDRASEC (RACECADOTIL)10 ML POWDER FOR 1-30 ML"]
for i, s in enumerate(test):
    print(test[i], '>>' , pattern.findall(s))

Output:

10ML >> ['10']
10 ML >> ['10']
10.5ML >> ['10.5']
1MG/1ML >> []
1MG/10ML >> []
10MG/0.5ML >> []
   10ML and 15ML   >> ['10', '15']
LODEXA (DEXAMETHASONE) 5 MG/ML INJECTION >> []
NSS.0.9% 1000 ML (PLASTIC BAG) >> ['1000']
110 MLM HIDRASEC (RACECADOTIL)10 ML POWDER FOR 1-30 ML >> ['10']

另一個可能更容易閱讀:

(?<![/\d-])(\d+\.*\d+)\s*ML\b

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM