[英]regex for extracting text after a word and before a special character and to exclude all other numeric
I'm trying to write a regex which for a given example text 我正在尝试为给定示例文本编写一个正则表达式
Section 2.1. 1.1.14. Minimum Rent Schedule (subiect to adjustment, if applicable):less than or greater than twelve (12) full calendar months (and such proration or adjustment being based upon the actual number of days in such Lease Year)
Output desired 期望的输出
Minimum Rent Schedule (subiect to adjustment, if applicable)
Everything between the word 'Section'
and upuntill special character ':'
. 单词
'Section'
和最高字符':'
。 But like in here, I don't want it to capture anything with numbers. 但是就像在这里一样,我不希望它用数字捕捉任何东西。
What I have tried until now is 我到目前为止一直在尝试的是
[Section]+.*[:]
This is one pattern. 这是一种模式。
Ex: 例如:
import re
s = "Section 2.1. 1.1.14. Minimum Rent Schedule (subiect to adjustment, if applicable):less than or greater than twelve (12) full calendar months (and such proration or adjustment being based upon the actual number of days in such Lease Year)"
print(re.match(r"Section[\d.\s]+(.*?):", s).group(1))
Output: 输出:
Minimum Rent Schedule (subiect to adjustment, if applicable)
If you have multiple elements use re.findall
如果您有多个元素,请使用
re.findall
Ex: 例如:
print(re.findall(r"Section[\d.\s]+(.*?):", your_text))
The pattern you tried uses a character class which will match any of the listed characters 1+ times. 您尝试过的模式使用的字符类将匹配列出的任何字符1次以上。
To not match anything which contains numbers after Section
, you could repeat 0+ times matching a space followed by non whitespace characters that contain at least a single number. 要不匹配在
Section
之后的任何包含数字的内容,您可以重复0+次以匹配一个空格,后跟一个至少包含一个数字的非空白字符。
Capture what follows that does not contain a number in a group. 捕获随后在组中不包含数字的内容。
Section (?:[^\s\d]*\d\S* )*([^:]+):
Explanation 说明
Section
Match Section and space Section
匹配分区和空间 (?:
Non capturing group (?:
非捕获组
[^\\s\\d]*
Match any char except a whitespace char and a digit 0+ times using a negated character class [^\\s\\d]*
使用否定的字符类匹配除空格字符和数字0+次以外的所有字符 \\d\\S*
Then match a digit followed by matching 0+ times a non whitespace char \\d\\S*
然后匹配一个数字,然后匹配0+次非空格字符 )*
Close group and repeat 0+ times )*
关闭群组并重复0次以上 ([^:]+):
Capture in group 1 matching 1+ times any char except a :
, then match a :
([^:]+):
捕捉在第1个匹配1+倍任何字符除了一个:
,则匹配:
For example 例如
import re
regex = r"Section (?:[^\s\d]*\d\S* )*([^:]+):"
s = "Section 2.1. 1.1.14. Minimum Rent Schedule (subiect to adjustment, if applicable):less than or greater than twelve (12) full calendar months (and such proration or adjustment being based upon the actual number of days in such Lease Year)"
print(re.match(regex, s).group(1))
Result 结果
Minimum Rent Schedule (subiect to adjustment, if applicable)
最低租金时间表(可调整)
To find multiple, you could use re.findall: 要查找多个,可以使用re.findall:
print(re.findall(regex, s))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.