简体   繁体   English

使用XSLT 1.0从xml提取文本

[英]extract text from xml using XSLT 1.0

I have the following xml file which is a part of a metadata (I extracted just one part of it) 我有以下xml文件,它是元数据的一部分(我只提取了其中一部分)

<?xml version="1.0" encoding="UTF-8"?>
        <gmd:MD_Metadata xmlns:gmd="http://www.isotc211.org/2005/gmd" xmlns:gts="http://www.isotc211.org/2005/gts" xmlns:gco="http://www.isotc211.org/2005/gco" xmlns:gml="http://www.opengis.net/gml" xmlns:geonet="http://www.fao.org/geonetwork" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.isotc211.org/2005/gmd http://www.isotc211.org/2005/gmd/gmd.xsd http://www.isotc211.org/2005/srv http://schemas.opengis.net/iso/19139/20060504/srv/srv.xsd">
        <gmd:pointOfContact>
                    <gmd:CI_ResponsibleParty>
                       <gmd:individualName xsi:type="gmd:PT_FreeText_PropertyType">
                          <gco:CharacterString>Freddie Mercury</gco:CharacterString>
                          <gmd:PT_FreeText>
                             <gmd:textGroup>
                                <gmd:LocalisedCharacterString locale="#ITA">Pippo</gmd:LocalisedCharacterString>
                             </gmd:textGroup>
                          </gmd:PT_FreeText>
                       </gmd:individualName>
                       <gmd:organisationName xsi:type="gmd:PT_FreeText_PropertyType">
                          <gco:CharacterString>The Queen</gco:CharacterString>
                          <gmd:PT_FreeText>
                             <gmd:textGroup>
                                <gmd:LocalisedCharacterString locale="#ITA">Music Institute</gmd:LocalisedCharacterString>
                             </gmd:textGroup>                         
                          </gmd:PT_FreeText>
                       </gmd:organisationName>
                       <gmd:positionName xsi:type="gmd:PT_FreeText_PropertyType">
                          <gco:CharacterString>Singer</gco:CharacterString>                              
                       </gmd:positionName>
                       <gmd:contactInfo>
                          <gmd:CI_Contact>
                             <gmd:phone>
                                <gmd:CI_Telephone>
                                   <gmd:voice>
                                      <gco:CharacterString>123456789</gco:CharacterString>
                                   </gmd:voice>
                                   <gmd:facsimile>
                                      <gco:CharacterString>123456789</gco:CharacterString>
                                   </gmd:facsimile>
                                </gmd:CI_Telephone>
                             </gmd:phone>
                             <gmd:address>
                                <gmd:CI_Address>                                  
                                   <gmd:city>
                                      <gco:CharacterString>Zanzibar</gco:CharacterString>
                                   </gmd:city>                                   
                                   <gmd:postalCode>
                                      <gco:CharacterString>00001</gco:CharacterString>
                                   </gmd:postalCode>
                                   <gmd:country>
                                      <gco:CharacterString>India</gco:CharacterString>
                                   </gmd:country>
                                   <gmd:electronicMailAddress>
                                      <gco:CharacterString>info@test.org</gco:CharacterString>
                                   </gmd:electronicMailAddress>
                                </gmd:CI_Address>
                             </gmd:address>
                             <gmd:transferOptions>
                <gmd:MD_DigitalTransferOptions>
                   <gmd:onLine>
                      <gmd:CI_OnlineResource>
                         <gmd:linkage>
                            <gmd:URL>http://www.google.it</gmd:URL>
                         </gmd:linkage>  
                      </gmd:CI_OnlineResource>
                   </gmd:onLine>               
                </gmd:MD_DigitalTransferOptions>
             </gmd:transferOptions>
                         </gmd:CI_Contact>
                       </gmd:contactInfo>
                       <gmd:role>
                          <gmd:CI_RoleCode codeListValue="pointOfContact" codeList="http://standards.iso.org/ittf/PubliclyAvailableStandards/ISO_19139_Schemas/resources/Codelist/ML_gmxCodelists.xml#CI_RoleCode"/>
                       </gmd:role>
        </gmd:CI_ResponsibleParty>
        </gmd:pointOfContact>
    </gmd:MD_Metadata>

I would like to extract only two information from this file: 1. the text Freddie Mercury (gco:CharacterString) 我只想从该文件中提取两个信息:1.文本Freddie Mercury(gco:CharacterString)

  1. the url http://www.google.it (gmd:URL) 网址http://www.google.it (gmd:URL)

I started trying using the following XSLT transformation 我开始尝试使用以下XSLT转换

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="text" encoding="UTF-8" />

<gmd:MD_Metadata xmlns:gco="http://www.isotc211.org/2005/gco" xmlns:gmd="http://www.isotc211.org/2005/gmd" gco:isoType="gmd:MD_Metadata"> 

    <xsl:template match="//gmd:pointOfContact">
        <xsl:apply-templates select="gco:CharacterString" />
    </xsl:template>

    <xsl:template match="gco:CharacterString">
        <xsl:text>name=</xsl:text>          
    </xsl:template>
</gmd:MD_Metadata>
</xsl:stylesheet>

but it is not working. 但它不起作用。

Could you please help me in this goal? 您能帮我实现这个目标吗?

I don't know if you want to only output the text or create a new xml doc, but this stylesheet picks up the elements you want, I think: 我不知道您是否只想输出文本或创建新的xml文档,但是我认为此样式表可以选择所需的元素:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0"
xmlns:gmd="http://www.isotc211.org/2005/gmd" 
xmlns:gts="http://www.isotc211.org/2005/gts" 
xmlns:gco="http://www.isotc211.org/2005/gco" 
xmlns:gml="http://www.opengis.net/gml" 
xmlns:geonet="http://www.fao.org/geonetwork">

<!-- create new root element -->
<xsl:template match="/">
    <root>
        <xsl:apply-templates/>
    </root>
</xsl:template>

<!-- identity templates walks tree and suppresses nodes with no template -->
<xsl:template match="node()|@*">
        <xsl:apply-templates select="node()|@*"/>    
</xsl:template>

<!-- output only on nodes we select -->
<xsl:template match="node()|@*" mode="output">
    <xsl:copy>
        <xsl:apply-templates select="node()|@*" mode="output"/>
    </xsl:copy>
</xsl:template>    

<!-- match our two nodes and wrap in match tag. -->
<xsl:template match="gco:CharacterString[ancestor::gmd:individualName] | gmd:URL">
    <match>
        <xsl:apply-templates mode="output"/>
    </match>
</xsl:template>

</xsl:stylesheet>

this creates the output 这将创建输出

<?xml version="1.0" encoding="utf-8"?>
<root xmlns:gmd="http://www.isotc211.org/2005/gmd"     xmlns:gts="http://www.isotc211.org/2005/gts"
xmlns:gco="http://www.isotc211.org/2005/gco"     xmlns:gml="http://www.opengis.net/gml"
xmlns:geonet="http://www.fao.org/geonetwork">
<match>Freddie Mercury</match>
<match>http://www.google.it</match>
</root>

If you just wanted a text output you could specify that in your xsl:method . 如果只需要文本输出,则可以在xsl:method指定它。 Also your description said you only wanted to output Freddy Mercury; 另外,您的描述还说您只想输出Freddy Mercury; the gmd:individualName was unique in this case, but not sure what kind of tagging variations there are on the set of files you would want to use this for. 在这种情况下, gmd:individualName是唯一的,但不确定要使用此标记的文件集上存在哪种标记变体。

This file only contained one gmd:URL tag, again not sure what kind of variation might exist, but this gets the output as per your question 该文件仅包含一个gmd:URL标记,再次不确定可能存在哪种变体,但这将根据您的问题获取输出

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM