在R XML Package中，xmlParse和xmlTreeParse有什么区别？

Question

When would I want to use the xmlParse function versus the xmlTreeParse function? 我什么时候想要使用xmlParse函数而不是xmlTreeParse函数？ Also, when are parameter values useInternalNodes=TRUE or asText=TRUE useful? 另外，何时参数值useInternalNodes=TRUE或asText=TRUE有用吗？

For example: 例如：

library("XML")
nct_url <- "http://clinicaltrials.gov/ct2/show/NCT00112281?resultsxml=true"
xml_doc <- xmlParse(nct_url, useInternalNodes=TRUE)

vs. 与

doc <- xmlTreeParse(getURL(nct_url), useInternalNodes=TRUE)
top <- xmlRoot(doc)
top[["keyword"]]
xmlValue(top[["start_date"]])
xmlValue(top[["location"]])

People seem to use the xmlTreeParse function for getting a non-repeating node via the $doc$children$... traversal. 人们似乎使用xmlTreeParse函数通过$ doc $ children $ ...遍历来获取非重复节点。 But I am not sure I understand when each approach is best. 但我不确定每种方法最好的时候都能理解。 Parsing XML is one of the reasons to almost abandon R and learn Python. 解析XML是几乎放弃R并学习Python的原因之一。 Lack of for-dummies examples without being forced to buy a book. 在没有被迫买书的情况下缺乏傻瓜的例子。

Answer 1

Here some feedback after using XML package. 这里有一些使用XML包后的反馈。

xmlParse is a version of xmlTreeParse where argument useInternalNodes is set to TRUE. xmlParse是xmlTreeParse一个版本，其中参数useInternalNodes设置为TRUE。
If you want to get an R object use xmlTreeParse . 如果要获取R对象，请使用xmlTreeParse 。 This can be not very efficient and unnecessary if you want just to extract partial part of the xml document. 如果您只想提取xml文档的部分部分，这可能不是非常有效和不必要的。
If you don't want to get an R object, just ac pointer, use xmlParse . 如果您不想获得R对象，只需使用ac指针，请使用xmlParse 。 But you should know some xpath bases to manipulate the result. 但是你应该知道一些xpath基础来操纵结果。
Use asText=TRUE if you have a text not a file or an url as input. 如果您有文本而不是文件或网址作为输入，请使用asText=TRUE 。

Here an example where I show the difference between the 2 functions: 这里有一个例子，我展示了两个函数之间的区别：

txt <- "<doc>
          <el> aa </el>
       </doc>"
library(XML)
res <- xmlParse(txt,asText=TRUE)
res.tree <- xmlTreeParse(txt,asText=TRUE)

Now inspecting the 2 objects: 现在检查2个对象：

class(res)
[1] "XMLInternalDocument" "XMLAbstractDocument"
> class(res.tree)
[1] "XMLDocument"         "XMLAbstractDocument"

You see that res is an internal document. 您看到res是内部文档。 It is pointer to a C object. 它是指向C对象的指针。 res.tree is an R object. res.tree是一个R对象。 You can get its attributes like this : 你可以得到这样的属性：

 res.tree$doc$children
$doc
<doc>
 <el>aa</el>
</doc>

For res, you should use a valid xpath request and one of theses functions ( xpathApply , xpathSApply , getNodeSet ) to inspect it. 对于res，您应该使用有效的xpath请求和其中一个函数（ xpathApply ， xpathSApply ， getNodeSet ）来检查它。 for example: 例如：

xpathApply(res,'//el')

Once you create a valid Xml Node , you can apply xmlValue , xmlGetAttr ,..to extract node information. 创建有效的Xml节点后，可以应用xmlValue ， xmlGetAttr ，..来提取节点信息。 So here this 2 statements are equivalent: 所以这两个陈述是等价的：

## we have already an R object, just apply xmlValue to the right child
xmlValue(res.tree$doc$children$doc)
## xpathSApply create an R object and pass it to
xpathSApply(res,'//el',xmlValue)

在R XML Package中，xmlParse和xmlTreeParse有什么区别？

问题描述

1 个解决方案

解决方案1
13 已采纳 2013-12-19 15:55:15

在R XML Package中，xmlParse和xmlTreeParse有什么区别？

问题描述

1 个解决方案

解决方案1 13 已采纳 2013-12-19 15:55:15

解决方案1
13 已采纳 2013-12-19 15:55:15