简体   繁体   English

Python正则表达式 - 获取两者之间的内容

[英]Python regex - get contents in between

I have a word/text file containing,我有一个单词/文本文件,其中包含,

1. 10 Liter sample of an ideal gas is expanded reversibly and isothermally at 300k from initial pressure of 10atm to final pressure of 1atm. The heat absorbed by gas during the process is approximately.
(A)15kJ
(B)23kJ
(C)32kJ
(D)50kJ

[Answer]:(B)

[QuestionType]:single_correct

2. Which of the following statement is correct

(A)Li is hander than the other alkali metals.
(B)In solvay process NH3 is recovered when the solution containing NH4Cl is treated with H2O.
(C)Na2CO3 is pearl ash.
(D)Berylium and Aluminium ions do not have strong tendency to form complexes like 

[Answer]:(C)

[QuestionType]:single_correct

I need to get each question in a separate list starting from question number to [QuestionType] .我需要将每个问题放在一个单独的列表中,从问题编号[QuestionType]

( 1. to [QuestionType]) ( 1. 到 [问题类型])

Output :输出 :

[[1. 10 Liter sample of an ideal gas is expanded reversibly and isothermally at 300k from initial pressure of 10atm to final pressure of 1atm. The heat absorbed by gas during the process is approximately.,(A)15kJ,(B)23kJ,(C)32kJ,(D)50kJ,[Answer]:(B),[QuestionType]:single_correct],
[2. Which of the following statement is correct,(A)Li is hander than the other alkali metals.,(B)In solvay process NH3 is recovered when the solution containing NH4Cl is treated with H2O.,(C)Na2CO3 is pearl ash.,(D)Berylium and Aluminium ions do not have strong tendency to form complexes like ,[Answer]:(C),[QuestionType]:single_correct]]

I tried in for loop but cant able to get contents in between我尝试了 for 循环,但无法在两者之间获取内容

import docx
import re
doc = docx.Document("QnA.docx")
for i in doc.paragraphs:
    if re.match(r"^[0-9]+[.]+",i.text):
        print(i.text) # matched number condition
    if re.match(r"(^\[QuestionType\])",i.text):
        print(i.text) # matched QuestionType condition

You might use a single pattern, starting the match with 1 or more digits and a dot.您可以使用单个模式,以 1 个或多个数字和一个点开始匹配。

Then continue matching all the lines that do not start with [QuestionType] and finally match that line.然后继续匹配所有不以[QuestionType]开头的行,最后匹配该行。

^\d+\..*(?:\r?\n(?!\[QuestionType]).*)*\r?\n\[QuestionType]:.*

See a regex demo and a Python demo查看正则表达式演示Python 演示

For example例如

import re

regex = r"^\d+\..*(?:\r?\n(?!\[QuestionType]).*)*\r?\n\[QuestionType]:.*"

s = ("1. 10 Liter sample of an ideal gas is expanded reversibly and isothermally at 300k from initial pressure of 10atm to final pressure of 1atm. The heat absorbed by gas during the process is approximately.\n"
    "(A)15kJ\n"
    "(B)23kJ\n"
    "(C)32kJ\n"
    "(D)50kJ\n\n"
    "[Answer]:(B)\n\n"
    "[QuestionType]:single_correct\n\n"
    "2. Which of the following statement is correct\n\n"
    "(A)Li is hander than the other alkali metals.\n"
    "(B)In solvay process NH3 is recovered when the solution containing NH4Cl is treated with H2O.\n"
    "(C)Na2CO3 is pearl ash.\n"
    "(D)Berylium and Aluminium ions do not have strong tendency to form complexes like \n\n"
    "[Answer]:(C)\n\n"
    "[QuestionType]:single_correct")
    
print(re.findall(regex, s, re.M))

Output输出

['1. 10 Liter sample of an ideal gas is expanded reversibly and isothermally at 300k from initial pressure of 10atm to final pressure of 1atm. The heat absorbed by gas during the process is approximately.\n(A)15kJ\n(B)23kJ\n(C)32kJ\n(D)50kJ\n\n[Answer]:(B)\n\n[QuestionType]:single_correct', '2. Which of the following statement is correct\n\n(A)Li is hander than the other alkali metals.\n(B)In solvay process NH3 is recovered when the solution containing NH4Cl is treated with H2O.\n(C)Na2CO3 is pearl ash.\n(D)Berylium and Aluminium ions do not have strong tendency to form complexes like \n\n[Answer]:(C)\n\n[QuestionType]:single_correct']

First, you get content of each question using regex.首先,您使用正则表达式获取每个问题的内容。 After, you split \\n for content of each question.之后,您将\\n拆分为每个问题的内容。

You could try following regex.您可以尝试遵循正则表达式。

\d+\.[\s\S]+?QuestionType.*

I also try to test on python.我也尝试在python上进行测试。

import re
content = '''1. 10 Liter sample of an ideal gas is expanded reversibly and isothermally at 300k from initial pressure of 10atm to final pressure of 1atm. The heat absorbed by gas during the process is approximately.
(A)15kJ
(B)23kJ
(C)32kJ
(D)50kJ

[Answer]:(B)

[QuestionType]:single_correct

2. Which of the following statement is correct

(A)Li is hander than the other alkali metals.
(B)In solvay process NH3 is recovered when the solution containing NH4Cl is treated with H2O.
(C)Na2CO3 is pearl ash.
(D)Berylium and Aluminium ions do not have strong tendency to form complexes like 

[Answer]:(C)

[QuestionType]:single_correct
'''

splitQuestion = re.findall(r"\d+\.[\s\S]+?QuestionType.*", content)

result = [];
for eachQuestion in splitQuestion:
    result.append(eachQuestion.split("\n"))

print(result)

Result.结果。

[['1. 10 Liter sample of an ideal gas is expanded reversibly and isothermally at 300k from initial pressure of 10atm to final pressure of 1atm. The heat absorbed by gas during the process is approximately.', '(A)15kJ', '(B)23kJ', '(C)32kJ', '(D)50kJ', '', '[Answer]:(B)', '', '[QuestionType]:single_correct'], ['2. Which of the following statement is correct', '', '(A)Li is hander than the other alkali metals.', '(B)In solvay process NH3 is recovered when the solution containing NH4Cl is treated with H2O.', '(C)Na2CO3 is pearl ash.', '(D)Berylium and Aluminium ions do not have strong tendency to form complexes like ', '', '[Answer]:(C)', '', '[QuestionType]:single_correct']]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM