简体   繁体   English

从一行中提取所有字符串,不包括多个正则表达式模式匹配项

[英]Extract all strings from a line excluding multiple regex patterns matches

I have these regex patterns which I use to extract specific strings from texts.我有这些正则表达式模式,我用它来从文本中提取特定的字符串。 I am using python3我正在使用 python3

'\d{2}\/\d{2} ' - Extract date dd/mm '\d{2}\/\d{2} ' - 提取日期 dd/mm

'\S+\.\d\d' - Extract amounts with 2 decimals '\S+\.\d\d' - 提取带 2 位小数的金额

' \d{6} ' - Extract ref no, 6 digits ' \d{6} ' - 提取参考编号,6 位数字

Now I want to extract whatever is left after extracting these data(example from sample: - "DUITNOW TRSF XXuu9876 CR ANG BENG KHOON").现在我想提取提取这些数据后剩下的任何内容(来自示例的示例:-“DUITNOW TRSF XXuu9876 CR ANG BENG KHOON”)。

What kind of regex pattern should I write?我应该写什么样的正则表达式模式?

Sample text -示范文本 -

"31/12 DUITNOW TRSF XXuu9876 CR 004085 ANG BENG KHOON 40,000.00 2,059,044.30" “31/12 DUITNOW TRSF XXuu9876 CR 004085 ANG BENG KHOON 40,000.00 2,059,044.30”

Appreciate your help.感谢你的帮助。 Thanks谢谢

Try this way.试试这个方法。

import re
s = "31/12 DUITNOW TRSF CR 004085 ANG BENG KHOON 40,000.00 2,059,044.30"
print(s)
s1 = re.sub('\d{2}\/\d{2} ', '', s)
print(s1)
s2 = re.sub('\S+\.\d\d', '', s1)
print(s2)
s3 = re.sub('\d{6}', '', s2)
print(s3)
s3 = 'DUITNOW TRSF CR  ANG BENG KHOON'

You can use the patterns you have to re.split the string (I have revamped the pattern a bit though):你可以使用你必须重新re.split字符串的模式(虽然我已经稍微修改了模式):

import re
p = r'\s*(?:\d{2}\/\d{2}(?!\S)|\S+\.\d\d|(?<!\S)\d{6}(?!\S))\s*'
text = "31/12 DUITNOW TRSF CR 004085 ANG BENG KHOON 40,000.00 2,059,044.30"
print( list(filter(None, re.split(p, text))) )
# => ['DUITNOW TRSF CR', 'ANG BENG KHOON']
print( " ".join(re.split(p, text)).strip() )
# => DUITNOW TRSF CR ANG BENG KHOON

See the regex and the Python demos .请参阅正则表达式Python 演示

Note the patterns are combined into a single pattern of the \s*(?:...|...|etc.)\s* type, ie a non-capturing group with optional whitespace patterns on both ends.请注意,这些模式被组合成\s*(?:...|...|etc.)\s*类型的单个模式,即两端带有可选空白模式的非捕获组。 The (?<!\S) and (?!\S) are whitespace boundaries . (?<!\S)(?!\S)空白边界

Since there may be empty strings resulting from matches at the start or end of string and in case of consecutive matches, the resulting list must be filtered from empty matches.由于字符串开头或结尾的匹配可能会产生空字符串,并且在连续匹配的情况下,必须从空匹配中过滤结果列表。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM