简体   繁体   English

模式匹配RegEx Python之间的文本

[英]Text between pattern match RegEx Python

I need some help with the following pattern, I am struggling many hours now. 我需要一些以下模式的帮助,我现在正在奋斗很多个小时。 I have a text like: 我有一个文字像:

<<12/24/2015 00:00  userrrr>>
********** Text all char and symbols ************
<<12/24/2015 00:00 CET userr>>
Text all char and symbols
<<12/24/2015 00:00 GMT+1 userrrr>> Text in same line
<<12/24/2015 00:00 CET userrr>>
Text all characters and symbols
<<12/24/2015 00:00 GMT+1 userrrrrrr>> Text in same line
More Text all characters and symbols
<<12/24/2015 00:00 CET userrrrr>>
More text all characters and symbols
<<12/24/2015 00:00 CET userrrrrrrrrrr>>
More Text all characters and symbols

By Using the pattern: 通过使用模式:

(\<<)(\d{2}/\d{2}/\d{4}\s\d{2}:\d{2})(.*?(?=>>))(>>)

The datetime and everything between the arrows is matched correctly.Unfortunately, I can not find a way to extract the text between the patterns.The final groups should look like (left_arrows), (datetime), (user), (right_arrows), (text).The closer I got was by using: 正确匹配日期时间和箭头之间的所有内容。遗憾的是,我找不到在模式之间提取文本的方法。最终的组应该看起来像(left_arrows),(datetime),(user),(right_arrows),(我得到的更接近的是使用:

(\<<)(\d{2}/\d{2}/\d{4}\s\d{2}:\d{2}\s\D{3}.*?(?=\s))\s(.*?(?=>>))(>>)((?s).*?(?=<<\d{2}/\d{2}))

But it doesn't match the first and the last correctly. 但它与第一​​个和最后一个不匹配。 Click Here to check the result(pythex.org) 点击这里查看结果(pythex.org)

(\<<)(\d{2}/\d{2}/\d{4}\s\d{2}:\d{2}\s\D{0,3}.*?(?=\s))\s(.*?(?=>>))(>>)((?s).*?(?=<<\d{2}/\d{2}|$))
                                                                                                ^^

You need to give |$ for the last line to match.See demo. 您需要为最后一行提供|$匹配。请参阅演示。

https://regex101.com/r/fM9lY3/51 https://regex101.com/r/fM9lY3/51

I think the easiest way will be to go over the file line by line and try to match them with different regexes, one for header lines and one for text lines. 我认为最简单的方法是逐行遍历文件并尝试将它们与不同的正则表达相匹配,一个用于标题行,一个用于文本行。 But if you really need to get it in one shot, you could do: 但如果你真的需要一次性完成它,你可以这样做:

(\<<)(\d{2}/\d{2}/\d{4}\s\d{2}:\d{2})(.*?(?=>>))(>>)\n\*+([^\*]+)\*+\n

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM