简体   繁体   English

如何在匹配后和正则表达式中的关键字之前隐藏文本?

[英]How to hide text after matching and before keyword in regular expressions?

I would like to match any user comment until KEYWORD. 我想匹配所有用户评论,直到KEYWORD。 Also I would like to skip variable unimportant text after last comment before keyword. 我也想在关键字之前的最后评论之后跳过不重要的可变文本。

import re`

string = '''
COMMENTS:  
first comment /user_x  
second comment
two lines /user_y
Here is some unimportant text.  
KEYWORD:
Don't match comments or anything else after first keyword like this /user_x  
KEYWORD: <- again
Also ignore same keyword which could appear serveral times.
'''

My result doesn't skip the unimportant text. 我的结果不会跳过不重要的文本。

pattern = re.compile(r'(?<=COMMENTS:)(.+?/(user_x|user_y))+?(?:.+?)(?=KEYWORD:)', flags=re.DOTALL)
match = re.search(pattern, string).group(0)

print(match)

I would like to have the following output: 我想要以下输出:

first comment /user_x  
second comment
in two lines /user_y

What am I doing wrong? 我究竟做错了什么? Thanks a lot 非常感谢

You may use 您可以使用

pattern = re.compile(r'COMMENTS:\s*((?:(?:(?!KEYWORD:).)+?/(?:user_x|user_y))+).+?KEYWORD:', flags=re.DOTALL)
match = re.search(pattern, s)
if match:
    print(match.group(1))

The output does not contain the irrelevant line any longer: 输出不再包含无关行:

first comment /user_x  
second comment
two lines /user_y

See the Python demo 参见Python演示

Details 细节

  • COMMENTS: - a literal substring COMMENTS: -文字子串
  • \\s* - 0+ whitespaces \\s* -0+空格
  • ((?:(?:(?!KEYWORD:).)+?/(?:user_x|user_y))+) - Capturing group 1 ( match.group(1) will hold this value if there is a match): one or more repetitions of ((?:(?:(?!KEYWORD:).)+?/(?:user_x|user_y))+) -捕获组1(如果存在匹配项, match.group(1)将保留此值):一个或多个重复
    • (?:(?!KEYWORD:).)+? - any char, one or more but as few as possible, that does not start the KEYWORD: char sequence -任何不会启动KEYWORD: char序列的char,一个或多个但尽可能少
    • / - a / char / -一/
    • (?:user_x|user_y) - user_x or user_x (?:user_x|user_y) - user_xuser_x
  • .+?KEYWORD: - a KEYWORD: after any 1 or more chars, as few as possible. .+?KEYWORD: -一个KEYWORD:任意1个或多个字符后,应尽可能少。

See the regex demo . 参见regex演示

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM