简体   繁体   English

正则表达式模式以在两个问题之间获得答案

[英]Regex pattern to get answers between two questions

How do I get the text between the end of the question (starting after ?) and the text before the next question that starts with "Question"?如何在问题结尾(从 ? 之后开始)和下一个以“问题”开头的问题之前的文本之间获取文本?

They answers are separated by new lines他们的答案由新行分隔

import re
text = "Which feature is not part of the linux system?
pipe
2) dirx
ls
ps

Question 2 ("

output= re.findall(r'\?\s*(.*?)\s*Question\)', splitext).split('\n')
print(output)

You may use this regex to match required text between ?您可以使用此正则表达式来匹配? and Question :Question

(?s)(?<=\?).+?(?=\nQuestion )

RegEx Demo正则表达式演示

Explanation:解释:

  • (?s) : Enable DOTALL mode to make sure . (?s) :启用 DOTALL 模式以确保. matched line break also匹配的换行符也
  • (?<=\\?) : Lookbehind to assert that we have ? (?<=\\?) :回顾断言我们有? just before the current position就在当前位置之前
  • .+? : Match 1+ of any characters including line breaks : 匹配 1+ 个任何字符,包括换行符
  • (?=\\nQuestion ) : Lookahead to assert that we have a line break followed by Question ahead of the current position (?=\\nQuestion ) : Lookahead 断言我们在当前位置之前有一个换行符,然后是Question

You might use a capture group, matching lines in between that do not end on a question mark and do not start with Question您可以使用捕获组,匹配之间不以问号结尾且不以Question开头的行

^.*\?((?:\n(?!.*\?$|Question\b).*)+)
  • ^ Start of string ^字符串开始
  • .*\\? Match a line ending on ?匹配以?结尾的行
  • ( capture group 1 (which will be returned by re.findall) (捕获组 1 (将由 re.findall 返回)
    • (?: Non capture group to repeat as a whole (?:非捕获组作为一个整体重复
      • \\n(?!.*\\?$|Question\\b) Match a newline, and assert that the line does not ends with ? \\n(?!.*\\?$|Question\\b)匹配一个换行符,并断言该行不以? or starts with Question或以问题开头
      • .* If the assertions are true, match the whole line .*如果断言为真,则匹配整行
    • )* Close the non capture group and optionally repeat )*关闭非捕获组并可选择重复
  • ) Close group 1 )关闭第 1 组

Regex demo正则表达式演示

For example例如

import re

text = ("Which feature is not part of the linux system?\n"
        "pipe\n"
        "2) dirx\n"
        "ls\n"
        "ps\n\n"
        "Question 2 (")

output = re.findall(r'^.*\?((?:\n(?!.*\?$|Question\b).*)*)', text)
print(output)

Output输出

['\npipe\n2) dirx\nls\nps\n']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM