简体   繁体   中英

Select specific tags in html using XSLT and print the contents of the tags to XML

Sorry about the last post.

Now I'll try to be more clear.

I need to select few tags from a html and i have the following xslt

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>

<xsl:template match="node()|@*">
 <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
 </xsl:copy>
</xsl:template>

<xsl:template match="a"><xsl:apply-templates/></xsl:template>
<xsl:template match="img"/>
</xsl:stylesheet>

This according to my understanding selects <a> and <img> tags from a html page/doc and prints the content inside the tags(tell me if my understanding in this is wrong). But the above XSLT outputs entire html of a page. Can anyone point out where could I have gone wrong and what needs to be done to make it right.

Thank You.

You're understanding is not quite right, I think. Looking at the templates in your XSLT in turn, you start off with the standard identity template

<xsl:template match="node()|@*">
   <xsl:copy>
     <xsl:apply-templates select="node()|@*"/>
   </xsl:copy>
</xsl:template>

This will simply copy the element and its attributes, and then continue processing its child elements. So, if you have an h1 tag in your HTML, it will output as is.

You then follow up with a template for a elements.

 <xsl:template match="a"><xsl:apply-templates/></xsl:template>

Because this is more specific than the identity template, it will take priority. In your case, it won't copy the a element, but it will process its children. Assuming your a element just contained text, this should be output as required.

Your final template matches the img element

<xsl:template match="img"/>

But all this is doing is ignoring it totally.

It is worth noting that XSLT has built-in templates, which it uses when it can't find a match. These won't copy the element, but will continue to process its children. So if you don't want to copy all the HTML elements, you can just rely on the built-in templates, and only add templates for the elements you do wish to take specific actions on.

I am not 100% sure of your requirement, but if you did just want to take some XHTML and output only the text within a elements, you could use this XSLT

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
   <xsl:output omit-xml-declaration="yes" indent="yes"/>
   <xsl:strip-space elements="*"/>

   <xsl:template match="a/text()">
       <xsl:value-of select="." />
   </xsl:template>

   <xsl:template match="text()" />

</xsl:stylesheet>

So, <xsl:template match="a/text()"> will output the text within the a elements, whereas the less specific <xsl:template match="text()"> will ignore all other text. The built-in template is used for other elements, and as mentioned, this will not output them, just process its children (so eventually it will reach the text nodes).

So, for example, if you had this HTML

<html>
  <head>
    <title>Test</title>
  </head>
  <body>
    <h1>Test</h1>
    Welcome!
    <img src="test.jpg" alt="Test Image" />
    <p><a href="test.html">Test Link</a></p>
  </body>
</html>

All that would be output would be

Test Link

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM