简体   繁体   English

Python:正则表达式在匹配项之间进行匹配

[英]Python: regex to match between matches

Text file output has syntax in a form of word> data <word , where the need is to remove the brackets. 文本文件输出的语法形式为word> data <word ,其中需要删除括号。 The data part can be pretty much anything (and of variable length), including new line, spaces, dots, letters etc. Currently i am using... data部分几乎可以是任何东西(且长度可变),包括换行,空格,点,字母等。目前我正在使用...

text = re.sub("(>)(.{1,10})(<)", r"\2", text)

...but it has obvious limitations, 1 being length. ...但是它有明显的局限性,1是长度。 The reason for not using * is because there are some restrictions, namely: 不使用*的原因是因为存在一些限制,即:

  • no other > or < can be present inside the match except for its boundaries 除其边界外,匹配项中不能存在其他><
  • only 1 number inside the match can form a pattern of single digits, that is dog> 7 4^ 8 0 . 2 1 6? <cat 只有匹配内1号可形成的位数的图案,即dog> 7 4^ 8 0 . 2 1 6? <cat dog> 7 4^ 8 0 . 2 1 6? <cat dog> 7 4^ 8 0 . 2 1 6? <cat & exam> 1961 5 . 66 9 <ple dog> 7 4^ 8 0 . 2 1 6? <catexam> 1961 5 . 66 9 <ple exam> 1961 5 . 66 9 <ple shall not match, while test> 0? <string exam> 1961 5 . 66 9 <ple不匹配,而test> 0? <string test> 0? <string or over> 1980 31, 6 000 <flow are fine and brackets shall be removed test> 0? <stringover> 1980 31, 6 000 <flow很好,应除去括号

How can this be approached? 如何解决?

Why not like this? 为什么不这样呢?

text = re.sub(r">((?:[^<>\d]|\d{2,})*)<", r"\1", text)

(?:[^<>\\d]|\\d{2,})* matches either any character except angle brackets or digits ( [^<>\\d] ) or any digits as long as there are at least two ( \\d{2,} ), repeatedly ( * ). (?:[^<>\\d]|\\d{2,})*匹配除尖括号或数字( [^<>\\d] )以外的任何字符,或匹配至少两个数字( \\d{2,} ),重复( * )。

Since none of the answerers added to their answers after the one edit of the question, i had to post another question to answer that part and actually finish the regexp. 由于在一个问题进行一次编辑后,没有一个回答者将答案添加到他们的答案中,所以我不得不发布另一个问题来回答这一部分并实际上完成了正则表达式。

At last, the final code i'm using is this: 最后,我正在使用的最终代码是:

text = re.sub(r">((?!(?:[^<]*\\b\\d\\b){2})[^><]*)<", r"\\1", text)

It allows for only 1 single digit number and no brackets inside the match, but otherwise catches anything else. 它只允许1个数字,并且比赛中没有方括号,否则会捕获其他任何内容。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM