簡體   English   中英

將 XML 數據轉換為 R 中的數據框

[英]convert XML data to data frame in R

我正在嘗試將 XML 文件轉換為數據框,但它只在列中顯示很少的信息。

library(XML)

# LOADING TRANSFORMED XML INTO R DATA FRAME
doc <- xmlParse("SRR12545290.xml") # https://www.ncbi.nlm.nih.gov/sra/?term=SRR12545290
xmldf <- xmlToDataFrame(doc)
head(xmldf)

這僅顯示

 │EXPERIMENT                                                                                                               
1│SRX903458416S amplicon of  Atlantic salmon: distal intestinal digestaSRP279301Illumina 16S metagenomic targeted sequenci…
 │SUBMISSION
1│SRA1118818
 │Organization                                                                                                             
1│Norwegian university of life scienceDepartment of Paraclinical SciencesNorwegian university of life scienceNO-0033OsloNo…
 │STUDY                                                                                                                    
1│SRP279301PRJNA660116ArcticFloraDiet with or without functional feed ingredients were fed to Atlantic salmon through fres…
 │SAMPLE                                                                                                                   
1│SRS7285186SAMN15936598FW-Ref749906gut metagenome['Distal intestinal digesta of Atlantic salmon', 'Distal intestinal dige…
 │Pool                  │RUN_SET                          
1│SRS7285186SAMN15936598│SRR12545290SRS7285186SAMN15936598

但相反,我想獲取 XML 文件中存在的所有信息。 如地理位置、主機名等。

這是一種解析整個 XML(使用 xml2 包)以獲取所有葉節點的值以及路徑名的方法。
不確定這是否是您要找的東西,但這只是一個開始。

library(xml2)
library(dplyr)    
doc<-read_xml("SRR12545290.xml")


#find all the nodes
allnodes <- doc %>% xml_find_all( '//*')

#find the leafs
leafs <- which( (allnodes %>% xml_children() %>% xml_length())==0)

#get the value in the leafs
value <- (allnodes %>% xml_text())[leafs]

#get the path to leaves to indentify the source
name <- (allnodes %>% xml_path())[leafs]
   
#clean up naming
name <- gsub("/EXPERIMENT_PACKAGE_SET/EXPERIMENT_PACKAGE/", "", name)

#final result
data.frame(name, value)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM