简体   繁体   English

如何在正则表达式中找到行尾?

[英]How to locate the end of the line in regex?

I have the following regex in = in.replaceAll(" d+\\n", ""); in = in.replaceAll(" d+\\n", "");有以下正则表达式in = in.replaceAll(" d+\\n", "");

I wanted to use it to get rid of the "d" at the end of lines 我想用它来消除行尾的“ d”

But I just won't do that d
<i>I just won't do that</i> d

No, no-no-no, no, no d

What is not accurate with my regex in = in.replaceAll(" d+\\n", ""); 我的正则表达式in = in.replaceAll(" d+\\n", "");

Most probably your lines are not separated only with \\n but with \\r\\n . 很可能您的行不仅用\\n隔开,而且用\\r\\n隔开。 You can try with \\r?\\n to optionally add \\r before \\n . 您可以尝试使用\\r?\\n\\n之前添加\\r Lets also not forget about last b which doesn't have any line separators after it. 我们也不要忘记最后一个b ,它后面没有任何行分隔符。 To handle it you need to add $ in your regex which means anchor representing end of your data. 要处理它,您需要在正则表达式中添加$ ,这表示表示数据结尾的锚点。 So your final pattern could look like 所以你的最终模式可能看起来像

in.replaceAll(" d+(\r?\n|$)", "")

In case you don't want to remove these line separators you can use "end of line anchor" $ with MULTILINE flag (?m) instead of line separators like 如果您不想删除这些行分隔符,则可以使用带有MULTILINE标志(?m) “行锚结尾” $代替像这样的行分隔符

in.replaceAll("(?m) d+$", "")

especially because there are no line separators after last b . 特别是因为最后一个b之后没有行分隔符。


In Java, when MULTILINE flag is specified, $ will match the empty string: 在Java中,当指定MULTILINE标志时, $将匹配空字符串:

  • Before a line terminator : 在行终止符之前:
    • A carriage-return character followed immediately by a newline character ( "\\r\\n" ) 回车符,后跟换行符( "\\r\\n"
    • Newline (line feed) character ( '\\n' ) without carriage-return ( '\\r' ) right in front 换行(换行)字符( '\\n' )不带回车符( '\\r'
    • Standalone carriage-return character ( '\\r' ) 独立的回车符( '\\r'
    • Next-line character ( '\…' ) 下一行字符( '\…'
    • Line-separator character ( '\
' ) 行分隔符( '\
'
    • Paragraph-separator character ( '\
' ) 段落分隔符( '\
'
  • At the end of the string 在字符串的末尾

When UNIX_LINES flag is specified along with MULTILINE flag, $ will match the empty string right before a newline ( '\\n' ) or at the end of the string. 当与MULTILINE标志一起指定UNIX_LINES标志时, $将与空字符串匹配,就在换行符( '\\n' )之前或字符串的末尾。


Anyway if it is possible don't use regex with HTML . 无论如何, 不要在HTML使用正则表达式

As Pshemo states in his answer , your string most likely contains Windows-style newline characters, which are \\r\\n as opposed to just \\n . 正如Pshemo在他的回答中指出的那样,您的字符串很可能包含Windows风格的换行符,它们是\\r\\n而不是\\n

You can modify your regex to account for both newline character (plus the case where the string ends with a d without a newline) with the code: 您可以使用以下代码修改正则表达式,以解决两个换行符(以及字符串以d结尾而不带换行符的情况)的问题:

in = in.replaceAll("(d+(?=\r\n)|d+(?=\n)|d+$)","");

This regex will remove anything that matches d+ followed by \\r\\n , d+ followed by \\n or d+$ (any d before the end of the String). 此正则表达式将删除与d+\\r\\nd+\\nd+$ (在字符串末尾之前的任何d匹配的任何内容。

(d+(?=\\r\\n)|d+(?=\\n)|d+$)

正则表达式可视化

Debuggex Demo Debuggex演示

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM