简体   繁体   中英

Regex Search end of line and beginning of next line

Trying to come up with a regex to search for keyword match at end of line and beginning of next line(if present)

I have tried below regex and does not seem to return desired result

re.compile(fr"\s(?!^)(keyword1|keyword2|keyword3)\s*\$\n\r\((\w+\W+|W+\w+))", re.MULTILINE | re.IGNORECASE)

My input for example is

sentence = """ This is my keyword
/n value"""

Output in above case should be keyword value

Thanks in advance

You could match the keyword (Or use an alternation ) to match more keywords and take trailing tabs and spaces into account after the keyword and after matching a newline.

Using 2 capturing groups as in the pattern you tried:

(?<!\S)(keyword)[\t ]*\r?\n[\t ]*(\w+)(?!\S)

Explanation

  • (?<!\\S) Negative lookbehind, assert what is directly on the left is not a non whitespace char
  • (keyword) Capture in group 1 matching the keyword
  • [\\t ]* Match 0+ tabs or spaces
  • \\r?\\n Match newline
  • [\\t ]* Match 0+ tabs or spaces
  • (\\w+) Capture group 2 match 1+ word chars
  • (?!\\S) Negative lookahead, assert what is directly on the right is not a non whitespace char

Regex demo | Python demo

For example:

import re

regex = r"(?<!\S)(keyword)[\t ]*\r?\n[\t ]*(\w+)(?!\S)"
test_str = (" This is my keyword\n"
    " value")

matches = re.search(regex, test_str)

if matches:
    print('{} {}'.format(matches.group(1), matches.group(2)))

Output

keyword value

How about \\b(keyword)\\n(\\w+)\\b ?

\b(keyword)\n(\w+)\b

\b                      get a word boundary
  (keyword)             capture keyword (replace with whatever you want)
           \n           match a newline
             (\w+)      capture some word characters, one or more
                  \b    get a word boundary

Because keyword and \\w+ are in capture groups, you can reference them as you wish later in your code.

Try it here!

My guess is that, depending of the number of new lines that you might have, an expression similar to:

\b(keyword1|keyword2|keyword3)\b[r\n]{1,2}(\S+)

might be somewhat close and the value is in \\2 , you can make the first group non-captured, then:

\b(?:keyword1|keyword2|keyword3)\b[r\n]{1,2}(\S+)

\\1 is the value .


If you wish to explore/simplify/modify the expression, it's been explained on the top right panel of regex101.com . If you'd like, you can also watch in this link , how it would match against some sample inputs.


The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM