简体   繁体   中英

How to make dot match newline characters using regular expressions

I have a string that contains normal characters, white charsets and newline characters between and . This regular expression doesn't work: /<div>(.*)<\\/div> . It is because .* doesn't match newline characters. My question is, how to do this?

You need to use the DOTALL modifier.

'/<div>(.*)<\/div>/s'

This might not give you exactly what you want because you are greedy matching. You might instead try a non-greedy match:

'/<div>(.*?)<\/div>/s'

You could also solve this by matching everything except '<' if there aren't other tags:

'/<div>([^<]*)<\/div>/'

Another observation is that you don't need to use / as your regular expression delimiters. Using another character means that you don't have to escape the / in </div> , improving readability. This applies to all the above regular expressions. Here's it would look if you use '#' instead of '/':

'#<div>([^<]*)</div>#'

However all these solutions can fail due to nested divs, extra whitespace, HTML comments and various other things. HTML is too complicated to parse with Regex, so you should consider using an HTML parser instead.

要匹配所有字符,您可以使用此技巧:

%\<div\>([\s\S]*)\</div\>%

I know that this is an old one, but since I stumbled across it recently. You can also use the (?s) mode modifier . Eg

(?s)/<div>(.*?)<\/div>

Maybe I'm missing the obvious, but is there any problem with just doing

(.|\n)

? This matches either any character except newline or a newline, so every character. Solved it for me, at least.

An option would be:

'/<div>(\n*|.*)<\/div>/i'

Which would match either newline or the dot identifier matches.

正则表达式编译器中通常有一个标志来告诉它点应该匹配换行符。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM