简体   繁体   English

正则表达式,用于在单词之后和特殊字符之前提取文本,并排除所有其他数字

[英]regex for extracting text after a word and before a special character and to exclude all other numeric

I'm trying to write a regex which for a given example text 我正在尝试为给定示例文本编写一个正则表达式

Section 2.1. 1.1.14. Minimum Rent Schedule (subiect to adjustment, if applicable):less than or greater than twelve (12) full calendar months (and such proration or adjustment being based upon the actual number of days in such Lease Year)

Output desired 期望的输出

Minimum Rent Schedule (subiect to adjustment, if applicable)

Everything between the word 'Section' and upuntill special character ':' . 单词'Section'和最高字符':' But like in here, I don't want it to capture anything with numbers. 但是就像在这里一样,我不希望它用数字捕捉任何东西。

What I have tried until now is 我到目前为止一直在尝试的是

[Section]+.*[:]

This is one pattern. 这是一种模式。

Ex: 例如:

import re

s = "Section 2.1. 1.1.14. Minimum Rent Schedule (subiect to adjustment, if applicable):less than or greater than twelve (12) full calendar months (and such proration or adjustment being based upon the actual number of days in such Lease Year)"
print(re.match(r"Section[\d.\s]+(.*?):", s).group(1))

Output: 输出:

Minimum Rent Schedule (subiect to adjustment, if applicable)

If you have multiple elements use re.findall 如果您有多个元素,请使用re.findall

Ex: 例如:

print(re.findall(r"Section[\d.\s]+(.*?):", your_text))

The pattern you tried uses a character class which will match any of the listed characters 1+ times. 您尝试过的模式使用的字符类将匹配列出的任何字符1次以上。

To not match anything which contains numbers after Section , you could repeat 0+ times matching a space followed by non whitespace characters that contain at least a single number. 要不匹配在Section之后的任何包含数字的内容,您可以重复0+次以匹配一个空格,后跟一个至少包含一个数字的非空白字符。

Capture what follows that does not contain a number in a group. 捕获随后在组中不包含数字的内容。

Section (?:[^\s\d]*\d\S* )*([^:]+):

Explanation 说明

  • Section Match Section and space Section匹配分区和空间
  • (?: Non capturing group (?:非捕获组
    • [^\\s\\d]* Match any char except a whitespace char and a digit 0+ times using a negated character class [^\\s\\d]*使用否定的字符类匹配除空格字符和数字0+次以外的所有字符
    • \\d\\S* Then match a digit followed by matching 0+ times a non whitespace char \\d\\S*然后匹配一个数字,然后匹配0+次非空格字符
  • )* Close group and repeat 0+ times )*关闭群组并重复0次以上
  • ([^:]+): Capture in group 1 matching 1+ times any char except a : , then match a : ([^:]+):捕捉在第1个匹配1+倍任何字符除了一个: ,则匹配:

Regex demo 正则表达式演示

For example 例如

import re

regex = r"Section (?:[^\s\d]*\d\S* )*([^:]+):"
s = "Section 2.1. 1.1.14. Minimum Rent Schedule (subiect to adjustment, if applicable):less than or greater than twelve (12) full calendar months (and such proration or adjustment being based upon the actual number of days in such Lease Year)"
print(re.match(regex, s).group(1))

Result 结果

Minimum Rent Schedule (subiect to adjustment, if applicable) 最低租金时间表(可调整)

To find multiple, you could use re.findall: 要查找多个,可以使用re.findall:

print(re.findall(regex, s))

Demo using re.findall 使用re.findall进行演示

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 RegEx用于提取特殊字符和单词之间的所有字符 - RegEx for extracting all characters between a special character and a word Python中的正则表达式可截取特殊字符之前的所有文本 - Regex in Python to grap all text before special character 如何从REGEX的捕获组中排除具有特殊字符的特定单词? - How to exclude particular word with special character from a capture group in REGEX? 在字符前提取单词 - extracting word before character 在Python中使用正则表达式获取字符之前/之后的单词 - Obtain word before/after a character with regex in Python 获取第一个数字字符 python 正则表达式之前的所有字符串 - get all the string before first numeric character python regex 正则表达式单词前的第一个字符 - Regex first character before word Python 正则表达式在分隔符之前和之后将文本与整个文本组合在一起? - Python RegEx group text before splitter character and after with the whole text? 如何在python中使用正则表达式直接在特殊字符之后和数字之前找到单词? - How to find the word directly after a special character and before a number using regular expressions in python? 正则表达式:获取以特定字母开头的所有数字和特殊字符,当数字后出现空格时停止 - Regex: get all numeric and special characters starting with specific letters, stop when space occurs after number
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM