简体   繁体   English

正则表达式:匹配特定单词后的所有内容

[英]Regular expression: Match everything after a particular word

I am using Python and would like to match all the words after test till a period (full-stop) or space is encountered.我正在使用 Python 并希望在test后匹配所有单词,直到遇到句号(句号)或空格。

text = "test : match this."

At the moment, I am using :目前,我正在使用:

import re
re.match('(?<=test :).*',text)

The above code doesn't match anything.上面的代码不匹配任何东西。 I need match this as my output.我需要将其match this为我的输出。

You need to use re.search since re.match tries to match from the beging of the string.您需要使用 re.search,因为re.match尝试从字符串的re.match进行匹配。 To match until a space or period is encountered.匹配直到遇到空格或句点。

re.search(r'(?<=test :)[^.\s]*',text)

To match all the chars until a period is encountered,要匹配所有字符直到遇到句点,

re.search(r'(?<=test :)[^.]*',text)

Everything after test, including test测试后的一切,包括测试

test.*

Everything after test, without test一切经过测试,未经测试

(?<=test).*

Example here on regexr.com regexr.com 上的示例

In a general case, as the title mentions, you may capture with (.*) pattern any 0 or more chars other than newline after any pattern(s) you want:在一般情况下,如标题所述,您可以使用(.*)模式在您想要的任何模式之后捕获除换行符以外的任何 0 个或更多字符:

import re
p = re.compile(r'test\s*:\s*(.*)')
s = "test : match this."
m = p.search(s)           # Run a regex search anywhere inside a string
if m:                     # If there is a match
    print(m.group(1))     # Print Group 1 value

If you want .如果你想要. to match across multiple lines, compile the regex with re.DOTALL or re.S flag (or add (?s) before the pattern):要匹配多行,请使用re.DOTALLre.S标志编译正则表达式(或在模式前添加(?s) ):

p = re.compile(r'test\s*:\s*(.*)', re.DOTALL)
p = re.compile(r'(?s)test\s*:\s*(.*)')

However, it will retrun match this.但是,它将重新运行match this. . . See also a regex demo .另请参阅正则表达式演示

You can add \\.您可以添加\\. pattern after (.*) to make the regex engine stop before the last . (.*)之后的模式使正则表达式引擎在最后一个. on that line:在那条线上:

test\s*:\s*(.*)\.

Watch out for re.match() since it will only look for a match at the beginning of the string (Avinash aleady pointed that out, but it is a very important note!) 注意re.match()因为它只会在字符串的开头寻找匹配项(Avinash aleady 指出了这一点,但这是一个非常重要的注意事项!)

See the regex demo and a sample Python code snippet :请参阅正则表达式演示示例 Python 代码片段

import re
p = re.compile(r'test\s*:\s*(.*)\.')
s = "test : match this."
m = p.search(s)           # Run a regex search anywhere inside a string
if m:                     # If there is a match
    print(m.group(1))     # Print Group 1 value

If you want to make sure test is matched as a whole word, add \\b before it (do not remove the r prefix from the string literal, or '\\b' will match a BACKSPACE char!) - r'\\btest\\s*:\\s*(.*)\\.'如果您想确保test作为整个单词匹配,请在它之前添加\\b (不要从字符串文字中删除r前缀,否则'\\b'将匹配一个退格字符!) - r'\\btest\\s*:\\s*(.*)\\.' . .

I don't see why you want to use regex if you're just getting a subset from a string.如果您只是从字符串中获取子集,我不明白为什么要使用正则表达式。

This works the same way:这以相同的方式工作:

if line.startswith('test:'):
    print(line[5:line.find('.')])

example:例子:

>>> line = "test: match this."
>>> print(line[5:line.find('.')])
 match this

Regex is slow, it is awkward to design, and difficult to debug.正则表达式速度慢,设计笨拙,调试困难。 There are definitely occassions to use it, but if you just want to extract the text between test: and .肯定有使用它的场合,但如果您只想提取test:. , then I don't think is one of those occasions. ,那么我不认为是那些场合之一。

See: https://softwareengineering.stackexchange.com/questions/113237/when-you-should-not-use-regular-expressions请参阅: https : //softwareengineering.stackexchange.com/questions/113237/when-you-should-not-use-regular-expressions

For more flexibility (for example if you are looping through a list of strings you want to find at the beginning of a string and then index out) replace 5 (the length of 'test:') in the index with len(str_you_looked_for) .为了获得更大的灵活性(例如,如果您要遍历要在字符串开头查找的字符串列表,然后将其编入索引),请将索引中的 5('test:' 的长度)替换为len(str_you_looked_for)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM