简体   繁体   中英

convert XML data to data frame in R

I am trying to convert XML files to data frame, but it only shows few information in the column.

library(XML)

# LOADING TRANSFORMED XML INTO R DATA FRAME
doc <- xmlParse("SRR12545290.xml") # https://www.ncbi.nlm.nih.gov/sra/?term=SRR12545290
xmldf <- xmlToDataFrame(doc)
head(xmldf)

This only shows

 │EXPERIMENT                                                                                                               
1│SRX903458416S amplicon of  Atlantic salmon: distal intestinal digestaSRP279301Illumina 16S metagenomic targeted sequenci…
 │SUBMISSION
1│SRA1118818
 │Organization                                                                                                             
1│Norwegian university of life scienceDepartment of Paraclinical SciencesNorwegian university of life scienceNO-0033OsloNo…
 │STUDY                                                                                                                    
1│SRP279301PRJNA660116ArcticFloraDiet with or without functional feed ingredients were fed to Atlantic salmon through fres…
 │SAMPLE                                                                                                                   
1│SRS7285186SAMN15936598FW-Ref749906gut metagenome['Distal intestinal digesta of Atlantic salmon', 'Distal intestinal dige…
 │Pool                  │RUN_SET                          
1│SRS7285186SAMN15936598│SRR12545290SRS7285186SAMN15936598

But instead, I wanted to get all the information present in the XML file. Like geographic location, host name etc.

Here is an approach to parse the entire XML (using the xml2 package) into obtain the values of all of the leaf nodes along with the path name.
Not sure if this is what you were looking for but a start.

library(xml2)
library(dplyr)    
doc<-read_xml("SRR12545290.xml")


#find all the nodes
allnodes <- doc %>% xml_find_all( '//*')

#find the leafs
leafs <- which( (allnodes %>% xml_children() %>% xml_length())==0)

#get the value in the leafs
value <- (allnodes %>% xml_text())[leafs]

#get the path to leaves to indentify the source
name <- (allnodes %>% xml_path())[leafs]
   
#clean up naming
name <- gsub("/EXPERIMENT_PACKAGE_SET/EXPERIMENT_PACKAGE/", "", name)

#final result
data.frame(name, value)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM