简体   繁体   English

如何使用正则表达式使点匹配换行符

[英]How to make dot match newline characters using regular expressions

I have a string that contains normal characters, white charsets and newline characters between and .我有一个字符串,其中包含 和 之间的普通字符、白色字符集和换行符。 This regular expression doesn't work: /<div>(.*)<\\/div> .此正则表达式不起作用: /<div>(.*)<\\/div> It is because .* doesn't match newline characters.这是因为.*不匹配换行符。 My question is, how to do this?我的问题是,如何做到这一点?

You need to use the DOTALL modifier.您需要使用DOTALL修饰符。

'/<div>(.*)<\/div>/s'

This might not give you exactly what you want because you are greedy matching.这可能不会给你你想要的东西,因为你是贪婪的匹配。 You might instead try a non-greedy match:您可以改为尝试非贪婪匹配:

'/<div>(.*?)<\/div>/s'

You could also solve this by matching everything except '<' if there aren't other tags:如果没有其他标签,您也可以通过匹配除“<”之外的所有内容来解决此问题:

'/<div>([^<]*)<\/div>/'

Another observation is that you don't need to use / as your regular expression delimiters.另一个观察结果是您不需要使用/作为正则表达式分隔符。 Using another character means that you don't have to escape the / in </div> , improving readability.使用其它字符意味着你没有逃跑的/</div>提高了可读性。 This applies to all the above regular expressions.这适用于上述所有正则表达式。 Here's it would look if you use '#' instead of '/':如果您使用 '#' 而不是 '/',则如下所示:

'#<div>([^<]*)</div>#'

However all these solutions can fail due to nested divs, extra whitespace, HTML comments and various other things.然而,所有这些解决方案都可能由于嵌套的 div、额外的空格、HTML 注释和其他各种原因而失败。 HTML is too complicated to parse with Regex, so you should consider using an HTML parser instead. HTML 太复杂,无法使用 Regex 进行解析,因此您应该考虑改用 HTML 解析器。

要匹配所有字符,您可以使用此技巧:

%\<div\>([\s\S]*)\</div\>%

I know that this is an old one, but since I stumbled across it recently.我知道这是一个旧的,但因为我最近偶然发现了它。 You can also use the (?s) mode modifier .您还可以使用(?s)模式修饰符 Eg例如

(?s)/<div>(.*?)<\/div>

Maybe I'm missing the obvious, but is there any problem with just doing也许我错过了显而易见的事情,但是这样做有什么问题吗

(.|\n)

? ? This matches either any character except newline or a newline, so every character.这匹配除换行符换行符以外的任何字符,因此每个字符。 Solved it for me, at least.至少为我解决了。

An option would be:一个选项是:

'/<div>(\n*|.*)<\/div>/i'

Which would match either newline or the dot identifier matches.这将匹配任何新行或点标识符匹配。

正则表达式编译器中通常有一个标志来告诉它点应该匹配换行符。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM