[英]Regex Python: Negative Lookahead delete/keep digits at the beginning
目的是将基数和序数保留在字符串的开头,只要它们紧接在单词PERFORMANCE
或SCORE
:
#These numbers are kept:
100 SCORE FOR STUDENT
80 PERFORMANCE FOR TEACHER
但是,如果数字在开头且后面的单词不同,则应将其删除:
#These numbers are removed
10095TH 10097TH 179TH SCHOOL ANIVERSARY
11 12 10 SECONDARY LEVELS
100 100 100 100 SCHOOL AGREEMENT
我遇到的问题是在单词PERFORMANCE
或SCORE
有由空格分隔的数字:
#All numbers should be kept
3 10 100 PERFORMANCE
001 10 12345 SCORE
我正在应用以下正则表达式,但最后一部分很乱(?!\\s*\\d*\\s*\\d*\\s*(?:PERFORMANCE|SCORE)\\b)
因为目前这只是考虑 3 组要保留PERFORMANCE
或SCORE
之前的数字:
(?<=[A-Za-z]\b )([ 0-9]*(ST|[RN]D|TH)?\b)|^(([\d ]+(ST|[RN]D|TH)?)*\b)(?!\s*\d*\s*\d*\s*(?:PERFORMANCE|SCORE)\b)
以前的正则表达式适用于以下情况:
3 10 100 PERFORMANCE
001 10 12345 SCORE
但如果我添加一组额外的数字,它将不起作用:
3 10 100 1 PERFORMANCE
001 10 1 12345 SCORE
如何概括此规则以包含所有数字集?
谢谢
请尝试以下操作:
^(?:\d+(?:ST|[RN]D|TH)?\s)+(?=[^\d]+$)(?!PERFORMANCE|SCORE)
^ anchor to beginning
(?: start non-capturing group
\d+ match one or more digits
(?:ST|[RN]D|TH)? optionally followed by one of your approved suffixes
\s then a whitespace
)+ one or more times
(?=[^\d]+$ assert that the rest of the line is number-free (forces the regex to not backtrack to the last number)
(?!PERFORMANCE|SCORE) assert that the following characters are NOT 'PERFORMANCE' or 'SCORE'
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.