简体   繁体   中英

Text between pattern match RegEx Python

I need some help with the following pattern, I am struggling many hours now. I have a text like:

<<12/24/2015 00:00  userrrr>>
********** Text all char and symbols ************
<<12/24/2015 00:00 CET userr>>
Text all char and symbols
<<12/24/2015 00:00 GMT+1 userrrr>> Text in same line
<<12/24/2015 00:00 CET userrr>>
Text all characters and symbols
<<12/24/2015 00:00 GMT+1 userrrrrrr>> Text in same line
More Text all characters and symbols
<<12/24/2015 00:00 CET userrrrr>>
More text all characters and symbols
<<12/24/2015 00:00 CET userrrrrrrrrrr>>
More Text all characters and symbols

By Using the pattern:

(\<<)(\d{2}/\d{2}/\d{4}\s\d{2}:\d{2})(.*?(?=>>))(>>)

The datetime and everything between the arrows is matched correctly.Unfortunately, I can not find a way to extract the text between the patterns.The final groups should look like (left_arrows), (datetime), (user), (right_arrows), (text).The closer I got was by using:

(\<<)(\d{2}/\d{2}/\d{4}\s\d{2}:\d{2}\s\D{3}.*?(?=\s))\s(.*?(?=>>))(>>)((?s).*?(?=<<\d{2}/\d{2}))

But it doesn't match the first and the last correctly. Click Here to check the result(pythex.org)

(\<<)(\d{2}/\d{2}/\d{4}\s\d{2}:\d{2}\s\D{0,3}.*?(?=\s))\s(.*?(?=>>))(>>)((?s).*?(?=<<\d{2}/\d{2}|$))
                                                                                                ^^

You need to give |$ for the last line to match.See demo.

https://regex101.com/r/fM9lY3/51

I think the easiest way will be to go over the file line by line and try to match them with different regexes, one for header lines and one for text lines. But if you really need to get it in one shot, you could do:

(\<<)(\d{2}/\d{2}/\d{4}\s\d{2}:\d{2})(.*?(?=>>))(>>)\n\*+([^\*]+)\*+\n

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM