使用XSLT函数删除除允许的标签之外的所有html标签

Question

我正在尝试使用XSLT清理从rss提要中获得的一些数据。我想删除除p标签之外的所有标签。

 Cows are kool.<p>The <i>milk</i> <b>costs</b> $1.99.</p>

我对如何在1.0或2.0中使用XSLT解决此问题毫不怀疑。

1）我已经看到了这个示例https://maulikdhorajia.blogspot.in/2011/06/removing-html-tags-using-xslt.html

但是我需要存在p标记，并且需要使用一个正则表达式。我们可以使用string-before-match函数并以类似的方式进行操作吗？我认为该功能在xpath中不存在。

2）我知道replace函数不能用于此目的，因为它期望一个字符串，并且如果我们传递任何节点，它将提取内容，然后将其传递给该函数，在这种情况下将无法达到删除标签的目的。

我对此感到有点困惑，使用了替换https://stackoverflow.com/a/18528749/745018 。

3）在使用xslt的nginx服务器中执行此操作。

请在下面的示例输入中找到我们输入的rss feed的body标签。

<p>The Supreme Court issued on Friday a bailable warrant against sitting Calcutta high court justice CS Karnan, an unprecedented order in a bitter confrontation between the judge and the top court.</p><p>A seven-judge bench headed by Chief Justice of India JS Khehar issued the order directing Karnan’s presence on <h2>March 31</h2> because the judge ignored an earlier court order summoning him.<i>Justice Karnan</i> had to appear</p>

更新：我也在为此寻找一个xslt函数

Answer 1

假设您可以使用XSLT 2.0，则可以将David Carlisle的HTML解析器（ https://github.com/davidcarlisle/web-xslt/blob/master/htmlparse/htmlparse.xsl ）应用于body元素的内容，然后处理生成的内容以除去p元素之外的每个元素的模式的节点：

<?xml version="1.0" encoding="UTF-8"?>
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"
    xmlns:d="data:,dpc"
    xmlns:xhtml="http://www.w3.org/1999/xhtml"
    exclude-result-prefixes="d xhtml">

    <xsl:import href="htmlparse-by-dcarlisle.xsl"/>

    <xsl:template match="@*|node()" mode="#default strip">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()" mode="#current"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="body">
        <xsl:copy>
            <xsl:apply-templates select="d:htmlparse(., '', true())" mode="strip"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="*[not(self::p)]" mode="strip">
        <xsl:apply-templates/>
    </xsl:template>

</xsl:transform>

对于输入

<rss>
    <entry>
        <body><![CDATA[<p>The Supreme Court issued on Friday a bailable warrant against sitting Calcutta high court justice CS Karnan, an unprecedented order in a bitter confrontation between the judge and the top court.</p><p>A seven-judge bench headed by Chief Justice of India JS Khehar issued the order directing Karnan’s presence on <h2>March 31</h2> because the judge ignored an earlier court order summoning him.<i>Justice Karnan</i> had to appear</p>]]></body>
    </entry>
</rss>

这给

<rss>
    <entry>
        <body><p>The Supreme Court issued on Friday a bailable warrant against sitting Calcutta high court justice CS Karnan, an unprecedented order in a bitter confrontation between the judge and the top court.</p><p>A seven-judge bench headed by Chief Justice of India JS Khehar issued the order directing Karnan’s presence on March 31 because the judge ignored an earlier court order summoning him.Justice Karnan had to appear</p></body>
    </entry>
</rss>

如果输入不是转义的，而是作为XML包含在输入中的，则无需解析它，而只需将模式应用于内容即可：

<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">

    <xsl:template match="@*|node()" mode="#default strip">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()" mode="#current"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="body">
        <xsl:copy>
            <xsl:apply-templates select="node()" mode="strip"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="*[not(self::p)]" mode="strip">
        <xsl:apply-templates/>
    </xsl:template>

</xsl:transform>

http://xsltransform.net/gWEamMc/1

使用XSLT函数删除除允许的标签之外的所有html标签

问题描述

1 个解决方案

解决方案1
4 已采纳 2017-03-10 12:54:10

使用XSLT函数删除除允许的标签之外的所有html标签

问题描述

1 个解决方案

解决方案1 4 已采纳 2017-03-10 12:54:10

解决方案1
4 已采纳 2017-03-10 12:54:10