简体   繁体   English

正则表达式 - 将字符串中的 \\n 和 \n 替换为<br>但不是 \\\\n

[英]Regex - Replace \\n and \n in string by <br> but not \\\\n

I am trying to replace all \\n and \n from a string by <br> but I dont want the \\\\n to be replaced.我正在尝试用<br>替换字符串中的所有\\n\n ,但我不希望替换\\\\n

Im trying with the following expression using negative lookbehind but it does not return the correct result because all the "n" and "\" are replaced:我尝试使用以下表达式使用否定的lookbehind,但它没有返回正确的结果,因为所有的"n""\"都被替换了:

import re
string = "this is a sample\\nstring\\\\ncontaining\nall cases\n\\n\n!"
re.sub(r"(?<!\\)([\\n\n]+)", r"<br>", string)
>>'this is a sample<br>stri<br>g<br>co<br>tai<br>i<br>g<br>all cases<br>!'

expected output预计 output

"this is a sample<br>string\\\\ncontaining<br>all cases<br><br><br>!"

This will do the magic:这会变魔术:

re.sub(r"(?<!\\)\\n|\n", "<br>", string)

Note: it will replace line breaks ("\n") and escaped line breaks ("\n" or r"\n").注意:它将替换换行符(“\n”)和转义换行符(“\n”或r“\n”)。 It does not escape "\\n" (or r"\n").它不会转义“\\n”(或 r“\n”)。 "\\\n" (backslash + new line) becomes "\\< br>". “\\\n”(反斜杠 + 新行)变为“\\<br>”。

Maybe, what you really want is:也许,你真正想要的是:

re.sub(r"(?<!\\)(\\\\)*\\n|\n", "\1<br>", string)

This replaces all new lines and all escaped n (r"\n").这将替换所有新行和所有转义的 n (r"\n")。 r"\\n" is not replaced. r"\\n" 不会被替换。 r"\\\n" is again replaced (escaped backslash + escaped n). r"\\\n" 再次被替换(转义的反斜杠 + 转义的 n)。

Your regex has the character class [\\n\n] which matches \ , n , or \n .您的正则表达式具有匹配\n\n的字符 class [\\n\n] Your lookbehind logic is correct, you just need to change your character class to a different subpattern: \\{1,2}n .您的后视逻辑是正确的,您只需要将角色 class 更改为不同的子模式: \\{1,2}n

See regex in use here请参阅此处使用的正则表达式

(?<!\\)\\{1,2}n
  • (?< \\) Negative lookbehind ensuring what precedes is not \ (?< \\)否定后视确保前面的不是\
  • \\{1,2} Match \ once or twice \\{1,2}匹配\一次或两次
  • n Match this literally n从字面上匹配

Replacement: <br>替换: <br>

Alternative: (?<?\\)\\\\ n as provided by @revo in the comments below the question替代方案: @revo在问题下方的评论中提供的(?<?\\)\\\\ n


Usage in code代码中的用法

See in use here 在这里查看使用中

import re

r = re.compile(r"(?<!\\)\\{1,2}n")
s = r"this is a sample\\nstring\\\\ncontaining\nall cases\n\\n\n!"
print(r.sub("<br>", s, 0))

Result: this is a sample<br>string\\\\ncontaining<br>all cases<br><br><br>结果: this is a sample<br>string\\\\ncontaining<br>all cases<br><br><br>

A regex version with a lookbehind is the shortest solution:带有后视功能的正则表达式版本是最短的解决方案:

import re
s = "this is a sample\\nstring\\\\ncontaining\nall cases and one more\\n\nadded"
print(re.sub(r"(?<!\\)\\n|\n", "<br>", s))
## => this is a sample<br>string\\ncontaining<br>all cases and one more<br><br>added

See this Pyhton demo .请参阅此 Pyhton 演示

Details细节

  • (?< \\)\\n - a negative lookbehind that fails the match if, immediately to the left of the current location, there is a \ symbol, and then a \n 2-char string (?< \\)\\n - 如果紧挨当前位置的左侧有一个\符号,然后是一个\n 2-char 字符串,则匹配失败
  • \n - an LF symbol. \n - LF 符号。

Here is a regex demo with the string from the OP.这是一个带有来自 OP 的字符串的正则表达式演示

An alternative is to match and capture \\n that you want to keep and then match \n and LF without capturing, and use a lambda expression as the replacement argument to implement custom replacement logic:另一种方法是匹配并捕获要保留的\\n ,然后匹配\n和 LF 而不捕获,并使用 lambda 表达式作为替换参数来实现自定义替换逻辑:

import re
s = "this is a sample\\nstring\\\\ncontaining\nall cases and one more\\n\nadded"
print(re.sub(r"(\\\\n)|\\n|\n", lambda x: r"<br>" if not x.group(1) else x.group(), s))
# => this is a sample<br>string\\ncontaining<br>all cases and one more<br><br>added

See the Python demo .请参阅Python 演示

Here, (\\\\n)|\\n|\n matches and captures \\n with (\\\\n) (that will be re-inserted into the result ( if not x.group(1) else x.group() = if Group 1 matches, replace with its contents), or matches \n (with \\n ) or LF (with \n ) and they will be replaced with <br> .在这里, (\\\\n)|\\n|\n匹配并捕获\\n(\\\\n) (这将被重新插入到结果中( if not x.group(1) else x.group() = 如果第 1 组匹配,则替换为其内容),或匹配\n (与\\n )或 LF(与\n ),它们将被替换为<br>

Maybe this will find what you want, and than you can replace with <br> :也许这会找到你想要的,而不是你可以用<br>替换:
(?<?\\)(\\)n|( < \\)(\\\\)n

上面提到的魔法和代码行在某些情况下具有相似的输出,但它没有在数据集中的选定列表中提供所需的输出,其中格式在几次迭代后发生了变化。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM