Regex to match “>” in HTML

Question

I need a regex which matches ">" character in a HTML string, but doesn't match tag's closed bracket. Example:

<span id="bla"> bla bla a > b bla bla bla <a>bla </a> </span>

The regex should match the ">" between a anb b

Answer 1

You can use a negative lookbehind: (?<!\\<[^>]+)\\> .
Un tested

This will match any > character that isn't preceded by the beginning of an HTML (a sequence starting with < and not containing > )

Answer 2

以下正则表达式应该起作用：

([^/]>)+

Answer 3

What you need is a regex that finds "unpaired" greater-than signs; >s that are not preceded by a < as you'd find in a tag.

Try this: "(?<!\\<[^<>]+)\\>" It should match a greater-than that is not part of an HTML tag; that is, a construct consisting of a less-than, some number of characters other than the angle-bracket characters, then a greater than.

EDIT: put in SLak's suggestions. I'll keep the < in the "not match" block just in case the less-than being matched is also not part of a tag, for instance << or <-. It shouldn't hurt the pattern's ability to match proper tags.

Answer 4

A specific solution rather than just an admonition:

" Beautiful Soup won't choke if you give it bad markup. It yields a parse tree that makes approximately as much sense as your original document. This is usually good enough to collect the data you need and run away. " - http://www.crummy.com/software/BeautifulSoup/

Don't use regex to parse html -

" Among programmers of any experience, it is generally regarded as A Bad Idea to attempt to parse HTML with regular expressions. " - Link

and " You can't parse [X]HTML with regex " - 4352 votes at the time of this posting

" Parsing HTML is a solved problem. You do not need to solve it. You just need to be lazy. Be lazy, use ... " something designed for that purpose.

Regex to match “>” in HTML

Question

4 answers

solution1
1 ACCPTED 2011-02-22 15:29:53

solution2
0 2011-02-22 15:27:31

solution3
0 2011-02-22 15:29:57

solution4
0 2011-02-22 15:30:39

Regex to match “>” in HTML

Question

4 answers

solution1 1 ACCPTED 2011-02-22 15:29:53

solution2 0 2011-02-22 15:27:31

solution3 0 2011-02-22 15:29:57

solution4 0 2011-02-22 15:30:39

solution1
1 ACCPTED 2011-02-22 15:29:53

solution2
0 2011-02-22 15:27:31

solution3
0 2011-02-22 15:29:57

solution4
0 2011-02-22 15:30:39