简体   繁体   中英

How to handle special characters like “&” in XML Parsing for R?

I am a newbie to XML in R So I need help to overcome this problem.

I have following XML

hp <- htmlParse('<li> <div class="subtle">Culture &amp; Values</div> <span   
class="notranslate gdBars sm " title="5.0"> <span class="gdBarsSep">Â </span><span  
class="gdBarsSep">Â </span><span class="gdBarsSep">Â </span><span class="gdBarsSep">Â  
</span><span class="gdBarsSep last">Â </span> <span style="width:94.5px" class="sel">
</span> </span>
</li> <li> <div class="subtle">Work/Life Balance</div> <span class="notranslate  
gdBars sm " title="4.0"> <span class="gdBarsSep">Â </span><span class="gdBarsSep">Â  
</span><span class="gdBarsSep">Â </span><span class="gdBarsSep">Â </span><span 
class="gdBarsSep last">Â </span> <span style="width:76.5px" class="sel"></span> </span> 
</li>')

In the above XML, I am trying to grab "title" value when div has a value "Culture & Values" using following R code but is not giving me expected output. I am getting Null value as output although I was expecting "5.0" as output.

CultureValues<-unlist(xpathApply(hp,"//div[text()='Culture &amp; Values']/following- 
sibling::span", xmlGetAttr,"title"))

Many thanks in advance.

You can use the following:

> xpathSApply(hp, "//*[contains(text(),'Culture ')]/following-sibling::span/@title")
title 
"5.0" 

or change your query to

> CultureValues<-unlist(xpathApply(hp,"//div[text()='Culture & Values']/following-sibling::span", xmlGetAttr,"title"))
> CultureValues
[1] "5.0"

&amp; is just the entity name for &

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM