简体   繁体   中英

Remove space in string in php with regular expression

I would like to remove the space in between the html tags through regular expression in php. May i know what is the rule? Without removing the space in the text.

For example, i would like to remove particularly the space between <tr> and <td> tag.

From:

<tr>
    <td>Hello there</td>
<tr>

to:

<tr><td>Hello there</td></tr>

Thanks.

First off: markup (HTML) and regex don't mix well . Be that as it may, you can remove spaces in between tags with the following regex quite easily:

$clean = preg_replace('/>\s+</', '><', $string);

This will remove spaces that are found in between tags if there's nothing else in between:

<p>Foobar <b>is</b> not a word <i>as such</i>    <p>

will be "translated" into:

<p>Foobar <b>is</b> not a word <i>as such</i><p>

That's fine, but still, it'd be better (and safer) to parse, sanitize and then echo the markup using the DOMDocument class. But before you start hacking away, and write thousands of lines of code to esnure you're processing valid markup, ask yourself this simple question:

How can I make sure that the markup I'm processing is well-formed, and valid to begin with?

Instead of writing code that works around bad markup, look into ways of making sure the data you're processing is of good quality to begin with.
Anyway, here's a simple example of how to use the DOMDocument class:

$dom = new DOMDocument;
$dom->loadHTML($string);
echo $dom->saveHTML();//echoes sanitized markup

This assumes the $string is a full DOM (including <html> , doctype and all other tags that implies). If you don't have such a string, you'll have to use saveXML :

echo $dom->getElementsByTagName('body')->item(0)->saveXML();

Where body is the root node of your markup. See the docs for examples and details

If the string you have is what you've included in your question, then all spaces need to be removed. In that case, regex is just not necessary :

$string = '<tr>
     <td>';
echo str_replace(' ', '', $string);//removes all spaces...

Ah well, browse through the documents of the DOMDocument class, it's worth the effort. Honest :)

This question is more complicated than it looks. It's easy to remove all spaces between all tags, like

<tr>  <td>   -> <tr><td>

but this naive approach will produce wrong results:

<i>hi</i> <b>there</b>  -> <i>hi</i><b>there</b>

To remove whitespace correctly you have to analyze the type of its parent node and only remove when the node doesn't allow text content ( http://www.w3.org/TR/html4/sgml/dtd.html might be helpful).

Definitely not something you can achieve with a regular expression!

$str = "<td> </td>";
$str2 = "<td></td>";

var_dump(preg_match('/\s/',$str));
var_dump(preg_match('/\s/',$str2));

Result 1 returns true

Result 2 returns false

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM