简体   繁体   English

Python 正则表达式:多行模式匹配两个以上的子字符串

[英]Python Regular Expression: Multiline pattern match with more than two substrings

I want to use a regex to find merge conflicts in a file.我想使用正则表达式来查找文件中的合并冲突。

I've found previous posts that show how to find a pattern that matches this structure我发现以前的帖子展示了如何找到与此结构匹配的模式

FIRST SUBSTRING 
/* several 
    new 
     lines 
*/
SECOND SUBSTRING

which works with the following regex: (^FIRST SUBSTRING)(.+)((?:\n.+)+)(SECOND SUBSTRING)它适用于以下正则表达式: (^FIRST SUBSTRING)(.+)((?:\n.+)+)(SECOND SUBSTRING)

However, I need to match this pattern:但是,我需要匹配这种模式:

FIRST SUBSTRING 
/* several 
    new 
     lines 
*/
SECOND SUBSTRING
/* several 
    new 
     lines 
*/
THIRD SUBSTRING

Where first, second and third substrings are <<<<<<< , ======= , >>>>>>> respectively.其中第一个、第二个和第三个子字符串分别是<<<<<<<=======>>>>>>>

I gave (^<<<<<<<)(.+)((?:\n.+)+)(=======)(.+)((?:\n.+)+)(>>>>>>) a shot but it does not work, which you can see on this demo ( (^<<<<<<<)(.+)((?:\n.+)+)(=======) does work but it is not exactly what I am looking for)我给了(^<<<<<<<)(.+)((?:\n.+)+)(=======)(.+)((?:\n.+)+)(>>>>>>)一个镜头,但它不起作用,你可以在这个演示中看到 ( (^<<<<<<<)(.+)((?:\n.+)+)(=======)确实有效,但这并不是我想要的)

Your expression does work with a couple of slight changes.你的表情确实有一些细微的变化。 Lengths of characters do not exactly match.字符长度不完全匹配。 And You are asking for at least one character after the SECOND SUBSTRING with (.+) , when there are none in the text.并且当文本中没有字符时,您要求在 SECOND SUBSTRING 和(.+)之后至少输入一个字符。

(<<<<<<<)(.+)((?:\n.+)+)(=======)(.*)((?:\n.+)+)(>>>>>>>)

From then onwards it makes groups as you expect (which the answer in the comments does not).从那时起,它会按照您的预期进行分组(评论中的答案没有)。 You probably want to distinguish between your and their code.您可能想区分的代码和他们的代码。

Plus, if you have to choose among working expressions, I would choose yours instead of the options proposed for readability.另外,如果您必须在工作表达式中进行选择,我会选择您的而不是建议的可读性选项。 Regex are not friendly things to read, and using repetitions (among other sophistications) make the code harder to read.正则表达式不是阅读友好的东西,使用重复(以及其他复杂性)会使代码更难阅读。 This also goes for the ?: , just query specific groups, there is no need to avoid group creation there.这也适用于?: ,只需查询特定组,无需避免在那里创建组。

Setting the flag s (single line - dot matches newline) is needed to match the text from the structure.需要设置标志s (单行 - 点匹配换行符)以匹配结构中的文本。 So you can use .*?所以你可以使用.*? for select multi line text overriding \n , until the next pattern ( ? lazy mode).对于 select 多行文本覆盖\n ,直到下一个模式( ?惰性模式)。 With this setting, the regex below matches what you need.使用此设置,下面的正则表达式符合您的需要。

(<{7})(.*)(={7})(.*?)(>{7})(.*?\n)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM