繁体   English   中英

python 正则表达式选择特定字符串之前和之后的所有内容

[英]python regex to select everything before and after a particular string

我正在尝试在 pandas 数据框中的一列上应用正则表达式,该列中包含文本数据,我正在尝试提取特定块。 这是我的数据的样例,

Patient Name :
NHI:  ABC2134
DOB:  10/03/1737

Patient Referred from: WTH ABC
Exam performed at:  XYZ Hospital Radiology
Reference:   ABCADAFAD
Date of exam:   12/11/2019
Examination(s) included in this report:
 CT Head

INDICATION:
Fall some time ago with ataxia since. Recent admission with 
tachybrady syndrome. 

这是因为您的正则表达式是:

(?s)Patient Referred(.*?)(?:(?:\r*\n){2})

pandas<\/code>中,您也可以使用extract<\/code>方法。

import pandas as pd
import re

# Create a sample dataframe
df = pd.DataFrame([
    {'diagnosis': '''Patient Name :
NHI:  ABC2134
DOB:  10/03/1737

Patient Referred from: WTH ABC
Exam performed at:  XYZ Hospital Radiology
Reference:   ABCADAFAD
Date of exam:   12/11/2019
Examination(s) included in this report:
 CT Head

INDICATION:
Fall some time ago with ataxia since. Recent admission with 
tachybrady syndrome.'''}
])

pat = re.compile(r'^(.*Patient Referred.*?)(?:\r?\n){2}', re.DOTALL)
df_extracted = df.diagnosis.str.extract(pat, expand=True)

您可以匹配(使用re.DOTALL ):

^.+\r?\n *CT Head\r?\n

演示

这个正则表达式可以分解如下。

^            # match beginning of string
.+           # match one or more characters, including line ​terminators
​\r?\n        # match line terminator (CR/LF or LF)
[ ]*CT Head  # match zero or more spaces followed by "CT Head"
​\r?\n        # match line terminator (CR/LF or LF)

在上面,我将空格放在字符类( [ ] )中只是为了使其可见。 \\r? Windows 创建的文件需要。


或者,您可以将以下正则表达式的匹配项(使用re.DOTALL )转换为空字符串。

(?:(?<= CT Head\n)|(?<= CT Head\r\n)).*

演示

这个正则表达式可以分解如下。

(?:                 # begin non-capture group 
  (?<= CT Head\n)   # current position is preceded by " CT Head\n"   
|
  (?<= CT Head\r\n) # current position is preceded by " CT Head\r\n"   
)
.*                  # match zero or characters (to end of string)  

(?<=...)是一个积极的回顾 请注意,Python 不支持可变长度的lookbehinds,例如

(?<= CT Head\r?\n)

这就是为什么需要两个lookbehinds。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM