简体   繁体   中英

Match all the text but html tag content

I wanna match the text outside the html tag div in the below example What is the Regex pattern that I should use? Thanks!

 Match me 1 <div>Hello World.</div> Match me 2.

Update: This is a free text not a well formatted HTML but it has custom/HTML tags inside it, I need to extract the text that is not inside a tag for further processing...

Try to use this pattern:

(^([\s\S]*?)(?=<div>))|(((?<=<\/div>))([\s\S]*?)(?=<div>))|((?<=<\/div>)[\s\S]*)

^ Matches the beginning of the string

\s Matches any character (spaces, tabs, line breaks) 字符(空格、制表符、换行符)

\S Matches any character that is character (spaces, tabs, line breaks) 字符的字符(空格、制表符、换行符)

* Match anything, ? non-greedily (match the minimum number of characters required)

| Using to combine between one or more pattern

() Expression will match as a group

(?=<div>) It is a group construct, that requires the escaped <div> , before any match can be made.

Match me1 <div><div>Hello World!</div> Match me 2 <div>Hello World!</div> Match me 3.

by default, regexes are greedy, meaning it will match as much as possible. Therefore if you use the above pattern it will select all the text till third <div> but by adding the non-greedy quantifier ? makes the regex only select all the text till the first <div>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM