简体   繁体   中英

Regex pattern to get answers between two questions

How do I get the text between the end of the question (starting after ?) and the text before the next question that starts with "Question"?

They answers are separated by new lines

import re
text = "Which feature is not part of the linux system?
pipe
2) dirx
ls
ps

Question 2 ("

output= re.findall(r'\?\s*(.*?)\s*Question\)', splitext).split('\n')
print(output)

You may use this regex to match required text between ? and Question :

(?s)(?<=\?).+?(?=\nQuestion )

RegEx Demo

Explanation:

  • (?s) : Enable DOTALL mode to make sure . matched line break also
  • (?<=\\?) : Lookbehind to assert that we have ? just before the current position
  • .+? : Match 1+ of any characters including line breaks
  • (?=\\nQuestion ) : Lookahead to assert that we have a line break followed by Question ahead of the current position

You might use a capture group, matching lines in between that do not end on a question mark and do not start with Question

^.*\?((?:\n(?!.*\?$|Question\b).*)+)
  • ^ Start of string
  • .*\\? Match a line ending on ?
  • ( capture group 1 (which will be returned by re.findall)
    • (?: Non capture group to repeat as a whole
      • \\n(?!.*\\?$|Question\\b) Match a newline, and assert that the line does not ends with ? or starts with Question
      • .* If the assertions are true, match the whole line
    • )* Close the non capture group and optionally repeat
  • ) Close group 1

Regex demo

For example

import re

text = ("Which feature is not part of the linux system?\n"
        "pipe\n"
        "2) dirx\n"
        "ls\n"
        "ps\n\n"
        "Question 2 (")

output = re.findall(r'^.*\?((?:\n(?!.*\?$|Question\b).*)*)', text)
print(output)

Output

['\npipe\n2) dirx\nls\nps\n']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM