Is it possible to find all < (lower than triangle bracket) (and >) in PHP that do not belong to valid HTML Elements (could be stored in an array)? I'd like to mask these characters automatically.
Example:
$html = '<div class="some class"><pre>5 < 8</pre></div>';
$triangles = getAllTriangles($html);
where getAllTriangles($html) results in only one triangle (the one between 5 and 8), so it could be masked with < while the others stay as the are to get the right output.
EDIT: Actually, the problem I have results from the PHP DOMDocument and it's parser. If I'd like to read a html string as above
$html = '<div class="some class"><pre>5 < 8</pre></div>';
$doc = new DOMDocument();
$doc->loadHTML($html);
$output = $doc->saveHTML();
This will result in
<div class="some class"><pre>5 </pre></div>
because of the triangle. For that, I'd like to mask these characters automatically. I'd would be a real problem to mask the in the html strings I'm reading. After all triangles are masked, I could use DOMDocument as I'd like to.
What I really want to have is a regular expression that replaces all triangles that don't belong to html-tags, the output in the example above would be:
<div class="some class"><pre>5 < 8</pre></div>
More examples:
input: <pre>while i < 10 do....</pre>
output: <pre>while i < 10 do....</pre>
input: <div><button-1></div>
output: <div><button-1></div>
You could try to strip all html tags from your string and use simple string functions to find the <
and >
characters on the result:
$html = '<div class="some class"><pre>5 < 8</pre></div>';
$no_html = strip_tags($html);
var_dump($no_html);
$count = substr_count($no_html, '<');
var_dump($count);
See the example .
However , please note that this approach may fail as your "html" string is not valid html as the <
and >
that are not part of html tags should be encoded as <
and >
.
If you need something different than the count, I would recommend using an html parser instead of regular expressions and possibly use regular expressions on the contents you find with the html parser. The same note about non-valid html applies here as well.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.