[英]Remove space in string in php with regular expression
I would like to remove the space in between the html tags through regular expression in php. 我想通过php中的正则表达式删除html标记之间的空间。 May i know what is the rule?
我可以知道这是什么规则吗? Without removing the space in the text.
不删除文本中的空格。
For example, i would like to remove particularly the space between <tr>
and <td>
tag. 例如,我想特别删除
<tr>
和<td>
标记之间的空格。
From: 从:
<tr>
<td>Hello there</td>
<tr>
to: 至:
<tr><td>Hello there</td></tr>
Thanks. 谢谢。
First off: markup (HTML) and regex don't mix well . 首先, 标记(HTML)和正则表达式混合不好 。 Be that as it may, you can remove spaces in between tags with the following regex quite easily:
尽管如此,您可以使用以下正则表达式轻松删除标签之间的空格:
$clean = preg_replace('/>\s+</', '><', $string);
This will remove spaces that are found in between tags if there's nothing else in between: 如果标签之间没有其他内容,则会删除在标签之间找到的空格:
<p>Foobar <b>is</b> not a word <i>as such</i> <p>
will be "translated" into: 将被“翻译”为:
<p>Foobar <b>is</b> not a word <i>as such</i><p>
That's fine, but still, it'd be better (and safer) to parse, sanitize and then echo the markup using the DOMDocument
class. 很好,但是使用
DOMDocument
类分析,清理然后回显标记会更好(更安全)。 But before you start hacking away, and write thousands of lines of code to esnure you're processing valid markup, ask yourself this simple question: 但是在您开始黑客攻击并编写数千行代码以确保您正在处理有效的标记之前,请问自己一个简单的问题:
Instead of writing code that works around bad markup, look into ways of making sure the data you're processing is of good quality to begin with. 与其编写可解决不良标记的代码,不如从一开始就研究确保您要处理的数据具有高质量的方法。
Anyway, here's a simple example of how to use the DOMDocument
class: 无论如何,这是一个有关如何使用
DOMDocument
类的简单示例:
$dom = new DOMDocument;
$dom->loadHTML($string);
echo $dom->saveHTML();//echoes sanitized markup
This assumes the $string
is a full DOM (including <html>
, doctype and all other tags that implies). 假设
$string
是完整的DOM(包括<html>
,doctype和所有其他暗含的标记)。 If you don't have such a string, you'll have to use saveXML
: 如果没有这样的字符串,则必须使用
saveXML
:
echo $dom->getElementsByTagName('body')->item(0)->saveXML();
Where body
is the root node of your markup. 其中
body
是标记的根节点。 See the docs for examples and details 请参阅文档以获取示例和详细信息
If the string you have is what you've included in your question, then all spaces need to be removed. 如果您的字符串是问题中包含的字符串,则需要删除所有空格。 In that case, regex is just not necessary :
在这种情况下,正则表达式是没有必要的 :
$string = '<tr>
<td>';
echo str_replace(' ', '', $string);//removes all spaces...
Ah well, browse through the documents of the DOMDocument
class, it's worth the effort. 嗯,浏览
DOMDocument
类的文档是值得的。 Honest :) 诚实 :)
This question is more complicated than it looks. 这个问题比看起来要复杂。 It's easy to remove all spaces between all tags, like
删除所有标签之间的所有空格很容易,例如
<tr> <td> -> <tr><td>
but this naive approach will produce wrong results: 但是这种幼稚的方法会产生错误的结果:
<i>hi</i> <b>there</b> -> <i>hi</i><b>there</b>
To remove whitespace correctly you have to analyze the type of its parent node and only remove when the node doesn't allow text content ( http://www.w3.org/TR/html4/sgml/dtd.html might be helpful). 要正确删除空格,您必须分析其父节点的类型,并且仅在该节点不允许文本内容时才删除( http://www.w3.org/TR/html4/sgml/dtd.html可能会有帮助) 。
Definitely not something you can achieve with a regular expression! 绝对不是用正则表达式可以实现的!
$str = "<td> </td>";
$str2 = "<td></td>";
var_dump(preg_match('/\s/',$str));
var_dump(preg_match('/\s/',$str2));
Result 1 returns true 结果1返回true
Result 2 returns false 结果2返回假
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.