简体   繁体   English

使用XSLT在html中选择特定标签,然后将标签内容打印到XML

[英]Select specific tags in html using XSLT and print the contents of the tags to XML

Sorry about the last post. 对不起,最后发表。

Now I'll try to be more clear. 现在,我将尝试变得更加清晰。

I need to select few tags from a html and i have the following xslt 我需要从html中选择一些标签,并且我有以下xslt

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>

<xsl:template match="node()|@*">
 <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
 </xsl:copy>
</xsl:template>

<xsl:template match="a"><xsl:apply-templates/></xsl:template>
<xsl:template match="img"/>
</xsl:stylesheet>

This according to my understanding selects <a> and <img> tags from a html page/doc and prints the content inside the tags(tell me if my understanding in this is wrong). 根据我的理解,这从html页面/ doc中选择<a><img>标签,并在标签内打印内容(如果我的理解是错误的,请告诉我)。 But the above XSLT outputs entire html of a page. 但是上面的XSLT输出页面的整个html。 Can anyone point out where could I have gone wrong and what needs to be done to make it right. 谁能指出我哪里出了问题以及需要做些什么才能使其正确。

Thank You. 谢谢。

You're understanding is not quite right, I think. 我认为您的理解不太正确。 Looking at the templates in your XSLT in turn, you start off with the standard identity template 依次查看XSLT中的模板,您将从标准身份模板开始

<xsl:template match="node()|@*">
   <xsl:copy>
     <xsl:apply-templates select="node()|@*"/>
   </xsl:copy>
</xsl:template>

This will simply copy the element and its attributes, and then continue processing its child elements. 这将简单地复制元素及其属性,然后继续处理其子元素。 So, if you have an h1 tag in your HTML, it will output as is. 因此,如果您的HTML中有一个h1标记,它将按原样输出。

You then follow up with a template for a elements. 然后,您跟进元素的模板。

 <xsl:template match="a"><xsl:apply-templates/></xsl:template>

Because this is more specific than the identity template, it will take priority. 因为这比身份模板更具体,所以它将具有优先权。 In your case, it won't copy the a element, but it will process its children. 在您的情况下,它不会复制a元素,但会处理其子元素。 Assuming your a element just contained text, this should be output as required. 假设您元素仅包含文本,则应根据需要输出。

Your final template matches the img element 您的最终模板与img元素匹配

<xsl:template match="img"/>

But all this is doing is ignoring it totally. 但是所有这些都是完全忽略了它。

It is worth noting that XSLT has built-in templates, which it uses when it can't find a match. 值得注意的是,XSLT具有内置模板,在找不到匹配项时会使用该模板。 These won't copy the element, but will continue to process its children. 这些不会复制元素,但是会继续处理其子元素。 So if you don't want to copy all the HTML elements, you can just rely on the built-in templates, and only add templates for the elements you do wish to take specific actions on. 因此,如果您不想复制所有HTML元素,则可以仅依赖内置模板,而只需为希望对其执行特定操作的元素添加模板。

I am not 100% sure of your requirement, but if you did just want to take some XHTML and output only the text within a elements, you could use this XSLT 我不是100%肯定您的要求,但如果你是只想拿一个元素中的一些XHTML和输出只有文字,你可以使用这个XSLT

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
   <xsl:output omit-xml-declaration="yes" indent="yes"/>
   <xsl:strip-space elements="*"/>

   <xsl:template match="a/text()">
       <xsl:value-of select="." />
   </xsl:template>

   <xsl:template match="text()" />

</xsl:stylesheet>

So, <xsl:template match="a/text()"> will output the text within the a elements, whereas the less specific <xsl:template match="text()"> will ignore all other text. 因此, <xsl:template match="a/text()">将输出a元素内的文本,而不太具体的<xsl:template match="text()">将忽略所有其他文本。 The built-in template is used for other elements, and as mentioned, this will not output them, just process its children (so eventually it will reach the text nodes). 内置模板用于其他元素,如前所述,它将不输出它们,仅处理其子元素(因此最终它将到达文本节点)。

So, for example, if you had this HTML 因此,例如,如果您有此HTML

<html>
  <head>
    <title>Test</title>
  </head>
  <body>
    <h1>Test</h1>
    Welcome!
    <img src="test.jpg" alt="Test Image" />
    <p><a href="test.html">Test Link</a></p>
  </body>
</html>

All that would be output would be 所有将要输出的是

Test Link

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM