I would like to remove the space in between the html tags through regular expression in php. May i know what is the rule? Without removing the space in the text.
For example, i would like to remove particularly the space between <tr>
and <td>
tag.
From:
<tr>
<td>Hello there</td>
<tr>
to:
<tr><td>Hello there</td></tr>
Thanks.
First off: markup (HTML) and regex don't mix well . Be that as it may, you can remove spaces in between tags with the following regex quite easily:
$clean = preg_replace('/>\s+</', '><', $string);
This will remove spaces that are found in between tags if there's nothing else in between:
<p>Foobar <b>is</b> not a word <i>as such</i> <p>
will be "translated" into:
<p>Foobar <b>is</b> not a word <i>as such</i><p>
That's fine, but still, it'd be better (and safer) to parse, sanitize and then echo the markup using the DOMDocument
class. But before you start hacking away, and write thousands of lines of code to esnure you're processing valid markup, ask yourself this simple question:
Instead of writing code that works around bad markup, look into ways of making sure the data you're processing is of good quality to begin with.
Anyway, here's a simple example of how to use the DOMDocument
class:
$dom = new DOMDocument;
$dom->loadHTML($string);
echo $dom->saveHTML();//echoes sanitized markup
This assumes the $string
is a full DOM (including <html>
, doctype and all other tags that implies). If you don't have such a string, you'll have to use saveXML
:
echo $dom->getElementsByTagName('body')->item(0)->saveXML();
Where body
is the root node of your markup. See the docs for examples and details
If the string you have is what you've included in your question, then all spaces need to be removed. In that case, regex is just not necessary :
$string = '<tr>
<td>';
echo str_replace(' ', '', $string);//removes all spaces...
Ah well, browse through the documents of the DOMDocument
class, it's worth the effort. Honest :)
This question is more complicated than it looks. It's easy to remove all spaces between all tags, like
<tr> <td> -> <tr><td>
but this naive approach will produce wrong results:
<i>hi</i> <b>there</b> -> <i>hi</i><b>there</b>
To remove whitespace correctly you have to analyze the type of its parent node and only remove when the node doesn't allow text content ( http://www.w3.org/TR/html4/sgml/dtd.html might be helpful).
Definitely not something you can achieve with a regular expression!
$str = "<td> </td>";
$str2 = "<td></td>";
var_dump(preg_match('/\s/',$str));
var_dump(preg_match('/\s/',$str2));
Result 1 returns true
Result 2 returns false
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.