在正则表达式中跳过匹配

Question

I am trying to extract some number value from a text.我正在尝试从文本中提取一些数值。 Skipping is done based on a matching text.跳过是基于匹配的文本完成的。 For example:例如：

      Input Text - 
      ABC Company Export Items 4 Bought by XYZ Amount 400.00 with GST# 36479 GST percentage is 20%.
      OR
      ABC Company Export Items 4 Bought by XYZ Amount 400.00 with GST Reg No. 36479 GST% is 20%.
      OR
      ABC Company Export Items 4 Bought by XYZ Amount 400.00 with GST Reg# 36479 GST% is 20%.

      Output Text -
      Amount 400.00
      GST 36479
      GST 20%

Main point is input text can be in any format but output text should be same.要点是输入文本可以是任何格式，但 output 文本应该相同。 One thing that will be same is GST Number will be non-decimal number, GST percentage will be number followed by "%" symbol and amount will be in decimal form.相同的一件事是 GST 编号将是非十进制数字，GST 百分比将是数字后跟“％”符号，金额将采用十进制形式。

I tried but not able to skip the non-numeric value after GST.我试过但无法在 GST 之后跳过非数字值。 Please help.请帮忙。

What I tried:我尝试了什么：

              pattern = re.compile(r"\b(?<=GST).\D(\d+)")

Answer 1

You can use您可以使用

\bAmount\s*(?P<amount>\d+(?:\.\d+)?).*?\bGST\D*(?P<gst_id>\d+(?:\.\d+)?).*?\bGST\D*(?P<gst_prcnt>\d+(?:\.\d+)?%)

See the regex demo .请参阅正则表达式演示。 Details :详情：

\bAmount\s* - a whole word Amount and zero or more whitespaces \bAmount\s* - 一个完整的单词Amount和零个或多个空格
(?P<amount>\d+(?:\.\d+)?) - Group "amount": one or more digits and then an optional sequence of . (?P<amount>\d+(?:\.\d+)?) - 组“数量”：一位或多位数字，然后是可选的. and one or more digits和一位或多位数字
.*? - some text (excluding whitespace) - 一些文本（不包括空格）
\bGST - a word GST \bGST - 一个字GST
\D* - zero or more chars other than digits \D* -除数字以外的零个或多个字符
(?P<gst_id>\d+(?:\.\d+)?) - Group "gst_id": one or more digits and then an optional sequence of . (?P<gst_id>\d+(?:\.\d+)?) - 组“gst_id”：一个或多个数字，然后是. and one or more digits和一位或多位数字
.*? - some text (excluding whitespace) - 一些文本（不包括空格）
\bGST\D* - a word GST and then zero or more chars other than digits \bGST\D* - 一个单词GST ，然后是数字以外的零个或多个字符
(?P<gst_prcnt>\d+(?:\.\d+)?%) - Group "gst_prcnt": one or more digits and then an optional sequence of . (?P<gst_prcnt>\d+(?:\.\d+)?%) - 组“gst_prcnt”：一个或多个数字，然后是可选的. and one or more digits, and then a % char.和一个或多个数字，然后是%字符。

See the Python demo :请参阅Python 演示：

import re
pattern = r"\bAmount\s*(?P<amount>\d+(?:\.\d+)?).*?\bGST\D*(?P<gst_id>\d+(?:\.\d+)?).*?\bGST\D*(?P<gst_prcnt>\d+(?:\.\d+)?%)"

texts = ["ABC Company Export Items 4 Bought by XYZ Amount 400.00 with GST# 36479 GST percentage is 20%.",
"ABC Company Export Items 4 Bought by XYZ Amount 400.00 with GST Reg No. 36479 GST% is 20%.",
"ABC Company Export Items 4 Bought by XYZ Amount 400.00 with GST Reg# 36479 GST% is 20%."]

for text in texts:
    m = re.search(pattern, text)
    if m:
        print(m.groupdict())

Output: Output：

{'amount': '400.00', 'gst_id': '36479', 'gst_prcnt': '20%'}
{'amount': '400.00', 'gst_id': '36479', 'gst_prcnt': '20%'}
{'amount': '400.00', 'gst_id': '36479', 'gst_prcnt': '20%'}

在正则表达式中跳过匹配

问题描述

1 个解决方案

解决方案1
2 已采纳 2021-05-07 12:26:12

在正则表达式中跳过匹配

问题描述

1 个解决方案

解决方案1 2 已采纳 2021-05-07 12:26:12

解决方案1
2 已采纳 2021-05-07 12:26:12