简体   繁体   English

使用 preg_replace 从表元素中删除所有高度标签/属性?

[英]Remove all height tags/attributes from table elements using preg_replace?

In our system, users often copy/paste tables from other sources, such as Excel/Word, which results in tables that have height tags or attributes in a number of places.在我们的系统中,用户经常从其他来源复制/粘贴表格,例如 Excel/Word,这导致表格在许多地方具有高度标签或属性。 I was thinking I could use pattern matching (preg_replace) to find and remove those instances as the inclusion of height specifications is causing issues in our system when this HTML is used by our PHP API to output formatted reports, but I've been trying to do so for the last 3 days without much success as I'm not adept at using regular expressions in this way.我在想我可以使用模式匹配(preg_replace)来查找和删除这些实例,因为当我们的 HTML 被我们的 PHP API 到 output 格式化报告使用时,包含高度规范会导致我们的系统出现问题,但我一直在尝试在过去的 3 天里这样做并没有取得太大的成功,因为我不擅长以这种方式使用正则表达式。

I've read the documentation and examples on php.net and reviewed quite a few of the posts here on this topic, but I still don't know how to only apply the pattern matching to only instances within a tag, etc.我已经阅读了 php.net 上的文档和示例,并在此处查看了很多关于该主题的帖子,但我仍然不知道如何仅将模式匹配应用于标记等中的实例。

Also, how would I remove the entire tag if it only includes a height attribute, and then only the height attribute if other attributes are included?另外,如果它只包含一个高度属性,我将如何删除整个标签,如果包含其他属性,我将只删除高度属性?

Here is an example of the code I need to clean.这是我需要清理的代码示例。 This is just a small portion, as it typically will include multiple table elements similar to what's I've included below, along with images, text, etc.:这只是一小部分,因为它通常会包含多个类似于我在下面包含的表格元素,以及图像、文本等:

<table style="height:126px;" width="243">
    <tbody>
        <tr style="height: 18px;">
            <td style="width: 38.5px; height: 18px;">ABC</td>
            <td style="width: 41.5469px; height: 18px;">123</td>
            <td style="width: 50.6562px; height: 18px;">DEF;</td>
            <td style="width: 99.2969px; height: 18px;">456</td>
        </tr>
            <tr style="height:18px;">
            <td style="width: 38.5px; height: 18px;">GHI</td>
            <td style="width: 41.5469px; height: 18px;">789</td>
            <td style="width: 50.6562px; height: 18px;">JKL</td>
            <td style="width: 99.2969px; height: 18px;">012</td>
        </tr>
            <tr style="height:18px;">
            <td style="width: 38.5px; height: 18px;">MNO</td>
            <td style="width: 41.5469px; height: 18px;">345</td>
            <td style="width: 50.6562px; height: 18px;">PQR</td>
            <td style="width: 99.2969px; height: 18px;">678</td>
        </tr>
    </tbody>
</table>

Can this be done using preg_replace, or do I need to use a different technique?这可以使用 preg_replace 来完成,还是我需要使用不同的技术? Any guidance or assistance would be greatly appreciated.任何指导或帮助将不胜感激。 A "cleaned" version of the above would look like this:上面的“清理”版本如下所示:

Cleaned清洁

<table width="243">
    <tbody>
        <tr>
            <td style="width: 38.5px;">ABC</td>
            <td style="width: 41.5469px;">123</td>
            <td style="width: 50.6562px;">DEF;</td>
            <td style="width: 99.2969px;">456</td>
        </tr>
            <tr>
            <td style="width: 38.5px;">GHI</td>
            <td style="width: 41.5469px;">789</td>
            <td style="width: 50.6562px;">JKL</td>
            <td style="width: 99.2969px;">012</td>
        </tr>
            <tr>
            <td style="width: 38.5px;">MNO</td>
            <td style="width: 41.5469px;">345</td>
            <td style="width: 50.6562px;">PQR</td>
            <td style="width: 99.2969px;">678</td>
        </tr>
    </tbody>
</table>

Have you considered simply swapping the 'height:' style attribute with a non-existent one (ie: the DOM will ignore unknown tags);您是否考虑过简单地将“高度:”样式属性替换为不存在的样式属性(即:DOM 将忽略未知标签); for example:例如:

$str = '<table style="height:126px;" width="243">
    <tbody>
        <tr style="height: 18px;">
            <td style="width: 38.5px; height: 18px;">ABC</td>
            <td style="width: 41.5469px; height: 18px;">123</td>
            <td style="width: 50.6562px; height: 18px;">DEF;</td>
            <td style="width: 99.2969px; height: 18px;">456</td>
        </tr>
            <tr style="height:18px;">
            <td style="width: 38.5px; height: 18px;">GHI</td>
            <td style="width: 41.5469px; height: 18px;">789</td>
            <td style="width: 50.6562px; height: 18px;">JKL</td>
            <td style="width: 99.2969px; height: 18px;">012</td>
        </tr>
            <tr style="height:18px;">
            <td style="width: 38.5px; height: 18px;">MNO</td>
            <td style="width: 41.5469px; height: 18px;">345</td>
            <td style="width: 50.6562px; height: 18px;">PQR</td>
            <td style="width: 99.2969px; height: 18px;">678</td>
        </tr>
    </tbody>
</table>';

$str = str_replace("height:","nulled:",$str);

echo $str;

I took your table HTML, put it into a string variable and did a simple str_replace to swap all references to height: to nulled: which strips the height attribute out the string and when I echo the string back I get the cleaned table you put in your example.我拿了你的表 HTML,把它放到一个字符串变量中,并做了一个简单的 str_replace 来交换对 height: 的所有引用:到 nulled: 它将高度属性从字符串中剥离出来,当我回显字符串时,我得到了你放入的清理过的表你的例子。

There may be an even prettier approach, but this worked for me.可能有更漂亮的方法,但这对我有用。 :) :)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM