简体   繁体   中英

Regex - Replace \\n and \n in string by <br> but not \\\\n

I am trying to replace all \\n and \n from a string by <br> but I dont want the \\\\n to be replaced.

Im trying with the following expression using negative lookbehind but it does not return the correct result because all the "n" and "\" are replaced:

import re
string = "this is a sample\\nstring\\\\ncontaining\nall cases\n\\n\n!"
re.sub(r"(?<!\\)([\\n\n]+)", r"<br>", string)
>>'this is a sample<br>stri<br>g<br>co<br>tai<br>i<br>g<br>all cases<br>!'

expected output

"this is a sample<br>string\\\\ncontaining<br>all cases<br><br><br>!"

This will do the magic:

re.sub(r"(?<!\\)\\n|\n", "<br>", string)

Note: it will replace line breaks ("\n") and escaped line breaks ("\n" or r"\n"). It does not escape "\\n" (or r"\n"). "\\\n" (backslash + new line) becomes "\\< br>".

Maybe, what you really want is:

re.sub(r"(?<!\\)(\\\\)*\\n|\n", "\1<br>", string)

This replaces all new lines and all escaped n (r"\n"). r"\\n" is not replaced. r"\\\n" is again replaced (escaped backslash + escaped n).

Your regex has the character class [\\n\n] which matches \ , n , or \n . Your lookbehind logic is correct, you just need to change your character class to a different subpattern: \\{1,2}n .

See regex in use here

(?<!\\)\\{1,2}n
  • (?< \\) Negative lookbehind ensuring what precedes is not \
  • \\{1,2} Match \ once or twice
  • n Match this literally

Replacement: <br>

Alternative: (?<?\\)\\\\ n as provided by @revo in the comments below the question


Usage in code

See in use here

import re

r = re.compile(r"(?<!\\)\\{1,2}n")
s = r"this is a sample\\nstring\\\\ncontaining\nall cases\n\\n\n!"
print(r.sub("<br>", s, 0))

Result: this is a sample<br>string\\\\ncontaining<br>all cases<br><br><br>

A regex version with a lookbehind is the shortest solution:

import re
s = "this is a sample\\nstring\\\\ncontaining\nall cases and one more\\n\nadded"
print(re.sub(r"(?<!\\)\\n|\n", "<br>", s))
## => this is a sample<br>string\\ncontaining<br>all cases and one more<br><br>added

See this Pyhton demo .

Details

  • (?< \\)\\n - a negative lookbehind that fails the match if, immediately to the left of the current location, there is a \ symbol, and then a \n 2-char string
  • \n - an LF symbol.

Here is a regex demo with the string from the OP.

An alternative is to match and capture \\n that you want to keep and then match \n and LF without capturing, and use a lambda expression as the replacement argument to implement custom replacement logic:

import re
s = "this is a sample\\nstring\\\\ncontaining\nall cases and one more\\n\nadded"
print(re.sub(r"(\\\\n)|\\n|\n", lambda x: r"<br>" if not x.group(1) else x.group(), s))
# => this is a sample<br>string\\ncontaining<br>all cases and one more<br><br>added

See the Python demo .

Here, (\\\\n)|\\n|\n matches and captures \\n with (\\\\n) (that will be re-inserted into the result ( if not x.group(1) else x.group() = if Group 1 matches, replace with its contents), or matches \n (with \\n ) or LF (with \n ) and they will be replaced with <br> .

Maybe this will find what you want, and than you can replace with <br> :
(?<?\\)(\\)n|( < \\)(\\\\)n

上面提到的魔法和代码行在某些情况下具有相似的输出,但它没有在数据集中的选定列表中提供所需的输出,其中格式在几次迭代后发生了变化。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM