python 正则表达式选择特定字符串之前和之后的所有内容

Question

我正在尝试在 pandas 数据框中的一列上应用正则表达式，该列中包含文本数据，我正在尝试提取特定块。 这是我的数据的样例，

Patient Name :
NHI:  ABC2134
DOB:  10/03/1737

Patient Referred from: WTH ABC
Exam performed at:  XYZ Hospital Radiology
Reference:   ABCADAFAD
Date of exam:   12/11/2019
Examination(s) included in this report:
 CT Head

INDICATION:
Fall some time ago with ataxia since. Recent admission with 
tachybrady syndrome.

Answer 1

这是因为您的正则表达式是：

(?s)Patient Referred(.*?)(?:(?:\r*\n){2})

Answer 2

你可以试试re.match(r'(?sm).+CT Head', st).group(0)<\/code>吗？

(?sm)<\/code>打开re.DOTALL<\/a>和re.MULTILINE<\/a>

我们使用re.match()<\/a>因为我们从字符串的开头匹配

Answer 3

在pandas<\/code>中，您也可以使用extract<\/code>方法。

import pandas as pd
import re

# Create a sample dataframe
df = pd.DataFrame([
    {'diagnosis': '''Patient Name :
NHI:  ABC2134
DOB:  10/03/1737

Patient Referred from: WTH ABC
Exam performed at:  XYZ Hospital Radiology
Reference:   ABCADAFAD
Date of exam:   12/11/2019
Examination(s) included in this report:
 CT Head

INDICATION:
Fall some time ago with ataxia since. Recent admission with 
tachybrady syndrome.'''}
])

pat = re.compile(r'^(.*Patient Referred.*?)(?:\r?\n){2}', re.DOTALL)
df_extracted = df.diagnosis.str.extract(pat, expand=True)

Answer 4

您可以匹配（使用re.DOTALL ）：

^.+\r?\n *CT Head\r?\n

演示

这个正则表达式可以分解如下。

^            # match beginning of string
.+           # match one or more characters, including line terminators
\r?\n        # match line terminator (CR/LF or LF)
[ ]*CT Head  # match zero or more spaces followed by "CT Head"
\r?\n        # match line terminator (CR/LF or LF)

在上面，我将空格放在字符类（ [ ] ）中只是为了使其可见。 \\r? Windows 创建的文件需要。

或者，您可以将以下正则表达式的匹配项（使用re.DOTALL ）转换为空字符串。

(?:(?<= CT Head\n)|(?<= CT Head\r\n)).*

演示

这个正则表达式可以分解如下。

(?:                 # begin non-capture group 
  (?<= CT Head\n)   # current position is preceded by " CT Head\n"   
|
  (?<= CT Head\r\n) # current position is preceded by " CT Head\r\n"   
)
.*                  # match zero or characters (to end of string)

(?<=...)是一个积极的回顾。 请注意，Python 不支持可变长度的lookbehinds，例如

(?<= CT Head\r?\n)

这就是为什么需要两个lookbehinds。

python 正则表达式选择特定字符串之前和之后的所有内容

问题描述

4 个解决方案

解决方案1
0 2022-02-06 03:10:34

解决方案2
0 2022-02-06 03:11:16

解决方案3
0 2022-02-06 03:25:06

解决方案4
0 2022-02-06 05:35:52

python 正则表达式选择特定字符串之前和之后的所有内容

问题描述

4 个解决方案

解决方案1 0 2022-02-06 03:10:34

解决方案2 0 2022-02-06 03:11:16

解决方案3 0 2022-02-06 03:25:06

解决方案4 0 2022-02-06 05:35:52

解决方案1
0 2022-02-06 03:10:34

解决方案2
0 2022-02-06 03:11:16

解决方案3
0 2022-02-06 03:25:06

解决方案4
0 2022-02-06 05:35:52