简体   繁体   English

如何使用 re.sub 匹配和替换模式的未定义数字

[英]How to match and replace undefined numbers of a pattern using re.sub

I wish to match a pattern in some text which occurs 0-n times and replace the text when this happens.我希望匹配某些文本中出现 0-n 次的模式,并在发生这种情况时替换文本。

Here is some sample text这是一些示例文本

XYZ

WWW

OOO

|NOTE:

ABC

DEF

GHI

3+|

HERE

I want to convert the above text to the following (I only wish to convert the part between "|NOTE:" and "3+|"):我想将上述文本转换为以下内容(我只想转换“|NOTE:”和“3+|”之间的部分):

XYZ

WWW

OOO

|NOTE:ABCDEFGHI

3+|

HERE

Where the text above is contained in "input_txt", I can do it with the following code:如果上面的文本包含在“input_txt”中,我可以使用以下代码来完成:

input_txt = re.sub(
    r'\|(NOTE):\n*(.*)\n*(.*)\n*(.*)(\n*[0-9]*[\+]*[\|]*)',
    r'|\1:\2\3\4\5',
    input_txt
    )

However, this code only works if there are three \n separated paragraphs after the "|NOTE:" text.但是,只有在“|NOTE:”文本之后有三个 \n 分隔的段落时,此代码才有效。 How do I change the so that it will match and replace any number of \n characters?如何更改它以匹配和替换任意数量的 \n 字符? I would prefer to do this with re.sub if possible (for my own interest, as this is an issue I have come across before without knowing how to do it), but would also be open to other suggestions of how it might better be done.如果可能的话,我宁愿用 re.sub 来做这个(为了我自己的利益,因为这是我以前遇到过的一个问题,但不知道该怎么做),但也愿意接受其他关于如何更好的建议完毕。

Try:尝试:

input_txt = re.sub(r'\n+([^0-9|])', r'\1', input_txt)

Output: Output:

|NOTE:ABCDEFGHI
3+|

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM