如何在XML文件中访问具有不同名称的子节点（子节点）的值？

Question

I am trying to parse xmlValue of certain child nodes from NCBI xml file. 我正在尝试从NCBI xml文件解析某些子节点的xmlValue 。 But, for some PM.IDs, the Root node <PubmedArticleSet> has different information wrt pubmed records, PubmedBookArticle and PubmedArticle . 但是，对于某些PM.ID， Root node <PubmedArticleSet>在发布记录中具有不同的信息，即PubmedBookArticle和PubmedArticle 。 I would like to pass a condition, if(xmlName(fetch.pubmed) == PubmedBookArticle extract certain values elseif (xmlName(fetch.pubmed) == PubmedArticle extract other values. Finally, make a dataframe with both the values corresponding to their PMIDs. It seems simple, but (xmlName(fetch.pubmed) throws error no applicable method for 'xmlName' applied to an object of class "c('XMLInternalDocument', 'XMLAbstractDocument')" Any help is appreciated, thank you 我想通过一个条件， if(xmlName(fetch.pubmed) == PubmedBookArticle提取某些值elseif (xmlName(fetch.pubmed) == PubmedArticle提取其他值。最后，用两个值组成一个dataframe框，这两个值对应于它们的PMID 。看起来很简单，但是(xmlName(fetch.pubmed)引发错误， no applicable method for 'xmlName' applied to an object of class "c('XMLInternalDocument', 'XMLAbstractDocument')"感谢您的任何帮助，谢谢

<?xml version="1.0"?>
<!DOCTYPE PubmedArticleSet PUBLIC "-//NLM//DTD PubMedArticle, 1st January 2015//EN" "http://www.ncbi.nlm.nih.gov/corehtml/query/DTD/pubmed_150101.dtd">
<PubmedArticleSet>
  <PubmedBookArticle>
    <BookDocument>
      <PMID Version="1">25506969</PMID>
      <ArticleIdList>
        <ArticleId IdType="bookaccession">NBK259188</ArticleId>
      </ArticleIdList> ....

   ...... </BookDocument>
  </PubmedBookArticle>

  <PubmedArticle>
    <MedlineCitation Status="Publisher" Owner="NLM">
      <PMID Version="1">25013473</PMID>
      <DateCreated>
        <Year>2014</Year>
        <Month>7</Month>
        <Day>11</Day>
      </DateCreated>....

    ....</MedlineCitation>
    </PubmedArticle>
</PubmedArticleSet>

My code is below 我的代码如下

library(XML)
library(rentrez)

PM.ID <- c("25506969"," 25032371","   24983039","24983034","24983032","24983031",
"26386083","26273372","26066373","25837167",
 "25466451","25013473")
# rentrez function to retrieve XMl file for above PIMD
fetch.pubmed <- entrez_fetch(db = "pubmed", id = PM.ID,
                             rettype = "xml", parsed = T)
# If empty records, return NA
FindNull <- function(x,x1child){
  res <- xpathSApply(x,x1child,xmlValue)
  if (length(res) == 0){
    out <- NA
  }else {
    out <- res
  }
  out
}

# extract contents from xml file
    xpathSApply(fetch.pubmed,"//PubmedArticle",FindNull,x1child = './/ArticleTitle')

    xpathSApply(fetch.pubmed,"//PubmedBookArticle",FindNull,x1child = './/BookTitle')

How do I get above code in a loop, so that I can retrieve values within PubmedArticle and PubmedBookArticle as an when the condition is met in each search ? 如何在循环中获得上述代码，以便可以在每次搜索中都满足条件时检索PubmedArticle和PubmedBookArticle中的值？

Answer 1

There are a few ways you could do this, but I would maybe get separate node sets for books and articles. 您可以通过几种方法来执行此操作，但是我可能会获得用于书籍和文章的单独节点集。

table( xpathSApply(fetch.pubmed, "/PubmedArticleSet/*", xmlName) )
    PubmedArticle PubmedBookArticle 
                6                 6 

books <- getNodeSet(fetch.pubmed, "/PubmedArticleSet/PubmedBookArticle")

data.frame( pmid = sapply(books, function(x) xpathSApply(x, ".//PMID", xmlValue)),
           title = sapply(books, function(x) xpathSApply(x, ".//BookTitle", xmlValue))
)

      pmid                                                                                                      title
1 25506969                                                     Probe Reports from the NIH Molecular Libraries Program
2 25032371                                                       Understanding Climate’s Influence on Human Evolution
3 24983039 Assessing the Effects of the Gulf of Mexico Oil Spill on Human Health: A Summary of the June 2010 Workshop
4 24983034                                                  In the Light of Evolution: Volume IV: The Human Condition
5 24983032                                            The Role of Human Factors in Home Health Care: Workshop Summary

Answer 2

Below NCBI XML path helps to extract abstracts from PubmedArticle , PubmedBookArticle and as well as those articles without abstracts (NA) . 下面NCBI XML路径有助于提取abstracts从PubmedArticle ， PubmedBookArticle和以及那些文章without abstracts (NA)

 <!-- language: lang-r --> abstracts <- xpathSApply(fetch.pubmed, c('//PubmedArticle//Article', '//PubmedBookArticle//Abstract'), function(x) { xmlValue(xmlChildren(x)$Abstract) }) abstracts <- data.frame(abstracts,stringsAsFactors = F) dim(abstracts) rownames(abstracts) <- PM.ID

如何在XML文件中访问具有不同名称的子节点（子节点）的值？

问题描述

2 个解决方案

解决方案1
1 已采纳 2015-11-04 15:53:43

解决方案2
0 2016-01-19 18:01:10

如何在XML文件中访问具有不同名称的子节点（子节点）的值？

问题描述

2 个解决方案

解决方案1 1 已采纳 2015-11-04 15:53:43

解决方案2 0 2016-01-19 18:01:10

解决方案1
1 已采纳 2015-11-04 15:53:43

解决方案2
0 2016-01-19 18:01:10