繁体   English   中英

如何使用 R 从 XML 文件中提取多个属性值

[英]How can I extract the multiple attribute values from XML file using R

xml 文本的链接。 请从链接文本中删除“http:”

下面是xml文本内容。

<?xml version="1.0" standalone="yes"?>
  <hydstra_xml_store date_format="BRITISH" version="1" application="WEBBATLS" timestamp="20170111144727">
   <webbatls>
   <site station="G0010001" parent="" stname="Sandover River - #7 Bore" shortname="Sandover R #7 Bore" mapname="" zone="53" easting="476064.2" northing="7552556.9" grdatum="MGA94" latitude="-22.131884543" longitude="134.767896023" lldatum="GDA94" posacc="" elev="12.658" elevdatum="" elevacc="01" local_map="" timezone="9.5" qquarter="" quarter="" section="" township="" range="" meridian="" active="False" owner="" commence="16/10/1968" cease="15/05/1974" stntype="SWG" region="5" orgcode="NTP" barcode="" category1="" category2="" category3="" category4="" category5="" category6="" category7="" category8="" category9="" category10="" category11="" category12="" category13="" category14="" category15="" category16="" category17="" category18="" category19="" category20="" spare1="" spare2="" enteredby="" checkedby="HYD" comment="" dbver41="" datecreate="30/12/1899" timecreate="0" usercreate="" datemod="09/12/2016" timemod="1111" usermod="BAILJ">
   <_decode_ station="Sandover R #7 Bore" parent="(None)" grdatum="Map Grid of Australia 1994" latitude="22&#176;07&apos;54.8&quot;S" longitude="134&#176;46&apos;04.4&quot;E" lldatum="Geodetic Datum of Australia 1994" posacc="Prec unknown" elevdatum="(unknown)" elevacc="Not Applicable" timezone="Offset of standard local time from GMT" meridian="Unspecified" owner="(None)" stntype="SW Gauging Station" region="NT Wide" orgcode="NT Water Resources" category1="(Not set)" category2="(Not set)" category3="(Not set)" category4="(unknown)" category5="Unknown" category6="Unknown" category7="Unknown" category8="Unknown" category9="Unknown" category10="Unknown" category11="(unknown)" category12="(unknown)" category13="(unknown)" category14="(unknown)" category15="(unknown)" category16="(unknown)" category17="(unknown)" category18="(unknown)" category19="(unknown)" category20="(unknown)" enteredby="(unknown)" checkedby="HYDMG &lt;Data imported by H"/>
</site>
<station station="G0010001" gauge="0" datum="GD" control="sandy river bed" contcode="" ctf="1.368" downst="False" gaugfacil="" hut="False" telemetry="False" streamdist="0" phone="" spillway="0" qmin="0" tmin="0" maxgaug="0" maxgdate="30/12/1899" catcharea="5050" enteredby="DRK" checkedby="HYD" bedslope="0" order="0" spare1="" spare2="" spare3="" spare4="" spare5="" dbver22="" datecreate="30/12/1899" timecreate="0" usercreate="" datemod="03/01/2017" timemod="212" usermod="SVCACC">
<_decode_ station="Sandover R #7 Bore" datum="Gauge Datum" contcode="Unknown" streamdist="km" tmin="Mins" catcharea="sq. km" enteredby="Doug Kinter" checkedby="HYDMG &lt;Data imported by H" spare1="(unknown)" spare2="(unknown)" spare3="(unknown)" spare4="(unknown)" spare5="(unknown)"/>
</station>
<stninis/>
<periods/>
<gwholes/>
<aquifers/>
<variables/>
<contents_lists>
<content_list secttype="Reports" secttypestr="Reports" section="WEBREPORTSDWHSW"/>
<content_list secttype="Documents" secttypestr="Documents" section="WEBDOCUMENTSDWHSW"/>
</contents_lists>
</webbatls>
</hydstra_xml_store>

从这个 xml 文本中,我想在“_”(蓝色字体的单词)中提取 >site 和 >/site< 之间的属性值。 例如“G0010001”、“Sandover River - #7 Bore”...

这是我使用的代码

url="http://water.nt.gov.au/wgen/cache/anon/G0010001.xml?1484112860902?1484112861283" data=XML::xmlParse(readLines(url)) xpathSApply(data, "//webbatls/site[@station,@....]")

我能够成功解析 xml 文本。 我在提取属性值时遇到困难。 我的情况真的很糟糕,请帮助我。

这是我调用 XpathSApply 函数后得到的输出

[[1]]
<site station="G0010001" parent="" stname="Sandover River - #7 Bore" shortname="Sandover R #7 Bore" mapname="" zone="53" easting="476064.2" northing="7552556.9" grdatum="MGA94" latitude="-22.131884543" longitude="134.767896023" lldatum="GDA94" posacc="" elev="12.658" elevdatum="" elevacc="01" local_map="" timezone="9.5" qquarter="" quarter="" section="" township="" range="" meridian="" active="False" owner="" commence="16/10/1968" cease="15/05/1974" stntype="SWG" region="5" orgcode="NTP" barcode="" category1="" category2="" category3="" category4="" category5="" category6="" category7="" category8="" category9="" category10="" category11="" category12="" category13="" category14="" category15="" category16="" category17="" category18="" category19="" category20="" spare1="" spare2="" enteredby="" checkedby="HYD" comment="" dbver41="" datecreate="30/12/1899" timecreate="0" usercreate="" datemod="09/12/2016" timemod="1111" usermod="BAILJ">
  <_decode_ station="Sandover R #7 Bore" parent="(None)" grdatum="Map Grid of Australia 1994" latitude="22&#xB0;07'54.8&quot;S" longitude="134&#xB0;46'04.4&quot;E" lldatum="Geodetic Datum of Australia 1994" posacc="Prec unknown" elevdatum="(unknown)" elevacc="Not Applicable" timezone="Offset of standard local time from GMT" meridian="Unspecified" owner="(None)" stntype="SW Gauging Station" region="NT Wide" orgcode="NT Water Resources" category1="(Not set)" category2="(Not set)" category3="(Not set)" category4="(unknown)" category5="Unknown" category6="Unknown" category7="Unknown" category8="Unknown" category9="Unknown" category10="Unknown" category11="(unknown)" category12="(unknown)" category13="(unknown)" category14="(unknown)" category15="(unknown)" category16="(unknown)" category17="(unknown)" category18="(unknown)" category19="(unknown)" category20="(unknown)" enteredby="(unknown)" checkedby="HYDMG &lt;Data imported by H"/>
</site>

例如将文本保存到“a”变量,然后尝试这个:)

b <-gsub(' " ', '' , a)

我是问这个问题的人,在浏览其他网络资源后终于找到了解决方案。 我想分享解决方案,因为这可能对面临相同问题的其他人有所帮助。

url="http://water.nt.gov.au/wgen/cache/anon/G0010001.xml?1484112860902?1484112861283"
    data=XML::xmlParse(readLines(url))
    dum1=(t(sapply(c("station",'stname','shortname','zone','latitude','longitude','lldatum','elev','commence','cease'), function(x) XML::xpathSApply(data, '//site', XML::xmlGetAttr, x))))

输出 'dum1' 包含“station”、'stname'、'shortname'、'zone'、'latitude'、'longitude'、'lldatum'、'elev'、'commence'、'cease' 的属性值

但我还需要一些从 >station 到 >/station< 的属性值。 为此,我再次使用最后一行代码(相应地更改属性名称和 Xpath)并将两个 vec 组合成一个向量。

dum2=(t(sapply(c("gauge",'datum','control','ctf','streamdist','maxgaug','maxgdate','catcharea'), function(x) XML::xpathSApply(data, '//station', XML::xmlGetAttr, x))))

    dat=cbind(dum1,dum2)

输出

dat
     station    stname                     shortname            zone latitude        longitude       lldatum elev     commence    
[1,] "G0010001" "Sandover River - #7 Bore" "Sandover R #7 Bore" "53" "-22.131884543" "134.767896023" "GDA94" "12.658" "16/10/1968"
     cease        gauge datum control           ctf     streamdist maxgaug maxgdate     catcharea
[1,] "15/05/1974" "0"   "GD"  "sandy river bed" "1.368" "0"        "0"     "30/12/1899" "5050"    

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM