[英]skipping a match in regex
I am trying to extract some number value from a text.我正在尝试从文本中提取一些数值。 Skipping is done based on a matching text.
跳过是基于匹配的文本完成的。 For example:
例如:
Input Text -
ABC Company Export Items 4 Bought by XYZ Amount 400.00 with GST# 36479 GST percentage is 20%.
OR
ABC Company Export Items 4 Bought by XYZ Amount 400.00 with GST Reg No. 36479 GST% is 20%.
OR
ABC Company Export Items 4 Bought by XYZ Amount 400.00 with GST Reg# 36479 GST% is 20%.
Output Text -
Amount 400.00
GST 36479
GST 20%
Main point is input text can be in any format but output text should be same.要点是输入文本可以是任何格式,但 output 文本应该相同。 One thing that will be same is GST Number will be non-decimal number, GST percentage will be number followed by "%" symbol and amount will be in decimal form.
相同的一件事是 GST 编号将是非十进制数字,GST 百分比将是数字后跟“%”符号,金额将采用十进制形式。
I tried but not able to skip the non-numeric value after GST.我试过但无法在 GST 之后跳过非数字值。 Please help.
请帮忙。
What I tried:我尝试了什么:
pattern = re.compile(r"\b(?<=GST).\D(\d+)")
You can use您可以使用
\bAmount\s*(?P<amount>\d+(?:\.\d+)?).*?\bGST\D*(?P<gst_id>\d+(?:\.\d+)?).*?\bGST\D*(?P<gst_prcnt>\d+(?:\.\d+)?%)
See the regex demo .请参阅正则表达式演示。 Details :
详情:
\bAmount\s*
- a whole word Amount
and zero or more whitespaces \bAmount\s*
- 一个完整的单词Amount
和零个或多个空格(?P<amount>\d+(?:\.\d+)?)
- Group "amount": one or more digits and then an optional sequence of .
(?P<amount>\d+(?:\.\d+)?)
- 组“数量”:一位或多位数字,然后是可选的.
and one or more digits.*?
- some text (excluding whitespace) \bGST
- a word GST
\bGST
- 一个字GST
\D*
- zero or more chars other than digits \D*
-除数字以外的零个或多个字符(?P<gst_id>\d+(?:\.\d+)?)
- Group "gst_id": one or more digits and then an optional sequence of .
(?P<gst_id>\d+(?:\.\d+)?)
- 组“gst_id”:一个或多个数字,然后是.
and one or more digits.*?
- some text (excluding whitespace) \bGST\D*
- a word GST
and then zero or more chars other than digits \bGST\D*
- 一个单词GST
,然后是数字以外的零个或多个字符(?P<gst_prcnt>\d+(?:\.\d+)?%)
- Group "gst_prcnt": one or more digits and then an optional sequence of .
(?P<gst_prcnt>\d+(?:\.\d+)?%)
- 组“gst_prcnt”:一个或多个数字,然后是可选的.
and one or more digits, and then a %
char.%
字符。 See the Python demo :请参阅Python 演示:
import re
pattern = r"\bAmount\s*(?P<amount>\d+(?:\.\d+)?).*?\bGST\D*(?P<gst_id>\d+(?:\.\d+)?).*?\bGST\D*(?P<gst_prcnt>\d+(?:\.\d+)?%)"
texts = ["ABC Company Export Items 4 Bought by XYZ Amount 400.00 with GST# 36479 GST percentage is 20%.",
"ABC Company Export Items 4 Bought by XYZ Amount 400.00 with GST Reg No. 36479 GST% is 20%.",
"ABC Company Export Items 4 Bought by XYZ Amount 400.00 with GST Reg# 36479 GST% is 20%."]
for text in texts:
m = re.search(pattern, text)
if m:
print(m.groupdict())
Output: Output:
{'amount': '400.00', 'gst_id': '36479', 'gst_prcnt': '20%'}
{'amount': '400.00', 'gst_id': '36479', 'gst_prcnt': '20%'}
{'amount': '400.00', 'gst_id': '36479', 'gst_prcnt': '20%'}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.