简体   繁体   English

R:将XML读取为data.frame

[英]R: Reading XML as data.frame

I'm facing this issue, I could not read an .xml file to make it as a data.frame in R. I know that this question have already great answers here and here , but I'm not able to decline the answers to my necessity, so sorry if it's duplicate. 我正在面对这个问题,我无法读取.xml文件以使其成为R中的data.frame 。我知道这个问题在这里这里已经有了很好的答案,但是我无法拒绝我的需要,对不起,如果重复的话。

I have a .xml like this: 我有一个像这样的.xml

<?xml version='1.0' encoding='UTF-8'?>
<LexicalResource>
  <GlobalInformation label="Created with the standard propagation algorithm"/>
  <Lexicon languageCoding="UTF-8" label="sentiment" language="-">
    <LexicalEntry id="id_0" partOfSpeech="adj">
      <Lemma writtenForm="word"/>
      <Sense>
        <Confidence score="0.333333333333" method="automatic"/>
        <Sentiment polarity="negative"/>
        <Domain/>
      </Sense>
    </LexicalEntry>
        </Lexicon>
</LexicalResource>

Stored locally. 本地存储。 So i tried this way: 所以我尝试了这种方式:

library(XML)
    doc<-xmlParse("...\\test2.xml")
    xmldf <- xmlToDataFrame(nodes=getNodeSet(doc,"//LexicalEntry/Lemma/Sense/Confidence/Sentiment"))

but the result is this: 但是结果是这样的:

> xmldf
data frame with 0 columns and 0 rows

So I tried the xml2 package: 所以我尝试了xml2包:

library(xml2)
pg <- read_xml("...test2.xml")

recs <- xml_find_all(pg, "LexicalEntry")

> recs
{xml_nodeset (0)}

I have a lack of knowledge in manipulating .xml files, so I think I'm missing the point. 我在处理.xml文件方面缺乏知识,因此我认为我没有抓住重点。 What am I doing wrong? 我究竟做错了什么?

You need the attributes, not the values, that's why the methods you have used do not work, try something like this: 您需要属性,而不是值,这就是为什么您使用的方法不起作用的原因,请尝试如下操作:

data.frame(as.list(xpathApply(doc, "//Lemma", fun = xmlAttrs)[[1]]), 
           as.list(xpathApply(doc, "//Confidence", fun = xmlAttrs)[[1]]), 
           as.list(xpathApply(doc, "//Sentiment", fun = xmlAttrs)[[1]]))

  writtenForm          score    method polarity
1        word 0.333333333333 automatic negative

Another option is to get all the attributes of the xml and build with them a data.frame: 另一个选择是获取xml的所有属性,并使用它们构建一个data.frame:

df <- data.frame(as.list(unlist(xmlToList(doc, addAttributes = TRUE, simplify = TRUE))))
colnames(df) <- unlist(lapply(strsplit(colnames(df), "\\."), function(x) x[length(x)]))
df
                                            label writtenForm          score    method 
1 Created with the standard propagation algorithm        word 0.333333333333 automatic 
  polarity   id partOfSpeech languageCoding     label language
1 negative id_0          adj          UTF-8 sentiment        -

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM