Thanks in advance for the help. I am using Python regular expressions to extract a part from a text which has the following layout:
(A lot of information)
time: 150
C-FXY
-- information ---
E-END
(A lot of information)
time: 5000
C-FXY
**--- INFORMATION I WANT TO EXTRACT ---**
E-END
(A lot of information)
time: 13000
C-FXY
-- information ---
E-END
(A lot of information)
I need to extract everything between C-FXY and E-END from the time step corresponding to 5000. For that I am using the following Python 3.6 sentence:
time_step = '5000'
text_part = re.search(r'time.*'+time_step+'.*C-FXY(.*?)E-END', text, re.DOTALL).group(1)
Unfortunately what I am getting on the output is that same edition between C-FXY and E-END but from the 13000 time step of the text, not the one I want from time: 5000.
Any help would be much appreciated. :)
The error is caused because your regex contains a greedy .*
between the time
part and the C-FXY
one. So it eats everything up to the last group.
It should be enough to use a non greedy version here:
text_part = re.search(r'time.*'+time_step+'.*?C-FXY(.*?)E-END', text, re.DOTALL).group(1)
Anyway, I would not use a multiline search of the whole file here, but I would just read the file line by line up to the time: 5000
, then up to the C-FXY
one, store anything from there up to a C-END
one, and end processing there.
You can solve it using the following code:
import re
text = """(A lot of information)
time: 150
C-FXY
-- information ---
E-END
(A lot of information)
time: 5000
C-FXY
**--- INFORMATION I WANT TO EXTRACT ---**
E-END
(A lot of information)
time: 13000
C-FXY
-- information ---
E-END
(A lot of information)"""
pattern = re.compile(r"C-FXY(.*?)E-END")
results = re.findall(r"C-FXY(.*?)E-END", text, re.DOTALL)
Now, if you print the results
:
for i, r in enumerate(results):
print(f"Resultado {i}:\n'{r}'")
The output would be:
Resultado 0:
'
-- information ---
'
Resultado 1:
'
**--- INFORMATION I WANT TO EXTRACT ---**
'
Resultado 2:
'
-- information ---
'
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.