python regex to select everything before and after a particular string

Question

I am trying to apply regex on one of the columns in pandas dataframe, this column has text data in it, I am trying to extract a specific block. This is a sample of how my data will look like,

Patient Name :
NHI:  ABC2134
DOB:  10/03/1737

Patient Referred from: WTH ABC
Exam performed at:  XYZ Hospital Radiology
Reference:   ABCADAFAD
Date of exam:   12/11/2019
Examination(s) included in this report:
 CT Head

INDICATION:
Fall some time ago with ataxia since. Recent admission with 
tachybrady syndrome.

Answer 1

It's because your regex is:

(?s)Patient Referred(.*?)(?:(?:\r*\n){2})

Answer 2

Can you try re.match(r'(?sm).+CT Head', st).group(0)<\/code> ?

Answer 3

In pandas<\/code> , you can use extract<\/code> method as well.

import pandas as pd
import re

# Create a sample dataframe
df = pd.DataFrame([
    {'diagnosis': '''Patient Name :
NHI:  ABC2134
DOB:  10/03/1737

Patient Referred from: WTH ABC
Exam performed at:  XYZ Hospital Radiology
Reference:   ABCADAFAD
Date of exam:   12/11/2019
Examination(s) included in this report:
 CT Head

INDICATION:
Fall some time ago with ataxia since. Recent admission with 
tachybrady syndrome.'''}
])

pat = re.compile(r'^(.*Patient Referred.*?)(?:\r?\n){2}', re.DOTALL)
df_extracted = df.diagnosis.str.extract(pat, expand=True)

Answer 4

You could match (with re.DOTALL ):

^.+\r?\n *CT Head\r?\n

Demo

This regular expression can be broken down as follows.

^            # match beginning of string
.+           # match one or more characters, including line terminators
\r?\n        # match line terminator (CR/LF or LF)
[ ]*CT Head  # match zero or more spaces followed by "CT Head"
\r?\n        # match line terminator (CR/LF or LF)

In the above I put the space in a character class ( [ ] ) merely to make it visible. \\r? is needed for files created by Windows.

Alternatively, you could convert the match of the following regular expression (with re.DOTALL ) to an empty string.

(?:(?<= CT Head\n)|(?<= CT Head\r\n)).*

Demo

This regular expression can be broken down as follows.

(?:                 # begin non-capture group 
  (?<= CT Head\n)   # current position is preceded by " CT Head\n"   
|
  (?<= CT Head\r\n) # current position is preceded by " CT Head\r\n"   
)
.*                  # match zero or characters (to end of string)

(?<=...) is a positive lookbehind . Note that Python does not support variable-length lookbehinds such as

(?<= CT Head\r?\n)

which is why two lookbehinds are needed.

python regex to select everything before and after a particular string

Question

4 answers

solution1
0 2022-02-06 03:10:34

solution2
0 2022-02-06 03:11:16

solution3
0 2022-02-06 03:25:06

solution4
0 2022-02-06 05:35:52

python regex to select everything before and after a particular string

Question

4 answers

solution1 0 2022-02-06 03:10:34

solution2 0 2022-02-06 03:11:16

solution3 0 2022-02-06 03:25:06

solution4 0 2022-02-06 05:35:52

solution1
0 2022-02-06 03:10:34

solution2
0 2022-02-06 03:11:16

solution3
0 2022-02-06 03:25:06

solution4
0 2022-02-06 05:35:52