在正則表達式中跳過匹配

Question

我正在嘗試從文本中提取一些數值。 跳過是基於匹配的文本完成的。 例如：

      Input Text - 
      ABC Company Export Items 4 Bought by XYZ Amount 400.00 with GST# 36479 GST percentage is 20%.
      OR
      ABC Company Export Items 4 Bought by XYZ Amount 400.00 with GST Reg No. 36479 GST% is 20%.
      OR
      ABC Company Export Items 4 Bought by XYZ Amount 400.00 with GST Reg# 36479 GST% is 20%.

      Output Text -
      Amount 400.00
      GST 36479
      GST 20%

要點是輸入文本可以是任何格式，但 output 文本應該相同。 相同的一件事是 GST 編號將是非十進制數字，GST 百分比將是數字后跟“％”符號，金額將采用十進制形式。

我試過但無法在 GST 之后跳過非數字值。 請幫忙。

我嘗試了什么：

              pattern = re.compile(r"\b(?<=GST).\D(\d+)")

Answer 1

您可以使用

\bAmount\s*(?P<amount>\d+(?:\.\d+)?).*?\bGST\D*(?P<gst_id>\d+(?:\.\d+)?).*?\bGST\D*(?P<gst_prcnt>\d+(?:\.\d+)?%)

請參閱正則表達式演示。 詳情：

\bAmount\s* - 一個完整的單詞Amount和零個或多個空格
(?P<amount>\d+(?:\.\d+)?) - 組“數量”：一位或多位數字，然后是可選的. 和一位或多位數字
.*? - 一些文本（不包括空格）
\bGST - 一個字GST
\D* -除數字以外的零個或多個字符
(?P<gst_id>\d+(?:\.\d+)?) - 組“gst_id”：一個或多個數字，然后是. 和一位或多位數字
.*? - 一些文本（不包括空格）
\bGST\D* - 一個單詞GST ，然后是數字以外的零個或多個字符
(?P<gst_prcnt>\d+(?:\.\d+)?%) - 組“gst_prcnt”：一個或多個數字，然后是可選的. 和一個或多個數字，然后是%字符。

請參閱Python 演示：

import re
pattern = r"\bAmount\s*(?P<amount>\d+(?:\.\d+)?).*?\bGST\D*(?P<gst_id>\d+(?:\.\d+)?).*?\bGST\D*(?P<gst_prcnt>\d+(?:\.\d+)?%)"

texts = ["ABC Company Export Items 4 Bought by XYZ Amount 400.00 with GST# 36479 GST percentage is 20%.",
"ABC Company Export Items 4 Bought by XYZ Amount 400.00 with GST Reg No. 36479 GST% is 20%.",
"ABC Company Export Items 4 Bought by XYZ Amount 400.00 with GST Reg# 36479 GST% is 20%."]

for text in texts:
    m = re.search(pattern, text)
    if m:
        print(m.groupdict())

Output：

{'amount': '400.00', 'gst_id': '36479', 'gst_prcnt': '20%'}
{'amount': '400.00', 'gst_id': '36479', 'gst_prcnt': '20%'}
{'amount': '400.00', 'gst_id': '36479', 'gst_prcnt': '20%'}

在正則表達式中跳過匹配

問題描述

1 個解決方案

解決方案1
2 已采納 2021-05-07 12:26:12

在正則表達式中跳過匹配

問題描述

1 個解決方案

解決方案1 2 已采納 2021-05-07 12:26:12

解決方案1
2 已采納 2021-05-07 12:26:12