[英]convert XML data to data frame in R
我正在嘗試將 XML 文件轉換為數據框,但它只在列中顯示很少的信息。
library(XML)
# LOADING TRANSFORMED XML INTO R DATA FRAME
doc <- xmlParse("SRR12545290.xml") # https://www.ncbi.nlm.nih.gov/sra/?term=SRR12545290
xmldf <- xmlToDataFrame(doc)
head(xmldf)
這僅顯示
│EXPERIMENT
1│SRX903458416S amplicon of Atlantic salmon: distal intestinal digestaSRP279301Illumina 16S metagenomic targeted sequenci…
│SUBMISSION
1│SRA1118818
│Organization
1│Norwegian university of life scienceDepartment of Paraclinical SciencesNorwegian university of life scienceNO-0033OsloNo…
│STUDY
1│SRP279301PRJNA660116ArcticFloraDiet with or without functional feed ingredients were fed to Atlantic salmon through fres…
│SAMPLE
1│SRS7285186SAMN15936598FW-Ref749906gut metagenome['Distal intestinal digesta of Atlantic salmon', 'Distal intestinal dige…
│Pool │RUN_SET
1│SRS7285186SAMN15936598│SRR12545290SRS7285186SAMN15936598
但相反,我想獲取 XML 文件中存在的所有信息。 如地理位置、主機名等。
這是一種解析整個 XML(使用 xml2 包)以獲取所有葉節點的值以及路徑名的方法。
不確定這是否是您要找的東西,但這只是一個開始。
library(xml2)
library(dplyr)
doc<-read_xml("SRR12545290.xml")
#find all the nodes
allnodes <- doc %>% xml_find_all( '//*')
#find the leafs
leafs <- which( (allnodes %>% xml_children() %>% xml_length())==0)
#get the value in the leafs
value <- (allnodes %>% xml_text())[leafs]
#get the path to leaves to indentify the source
name <- (allnodes %>% xml_path())[leafs]
#clean up naming
name <- gsub("/EXPERIMENT_PACKAGE_SET/EXPERIMENT_PACKAGE/", "", name)
#final result
data.frame(name, value)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.