[英]How to extract file properties of multiple XML files and combine them with the XML extracted data (Using R)
I am fairly new to R and need some help to (extract and) combine file names and properties with data extracted from multiple xml files (about 200) which will should then be converted into a dataframe.我对 R 相当陌生,需要一些帮助来(提取和)将文件名和属性与从多个 xml 文件(约 200 个)中提取的数据组合起来,然后将其转换为 Z6A8064B5DF47945550055550055。
I am using the following script to select the xml files, extract the data and convert it into a dataframe (and is working without errors):我正在使用以下脚本对 select xml 文件,提取数据并将其转换为 dataframe (并且工作没有错误):
library(XML)
library(plyr)
# Select multiple xml files within directory
FileName <- list.files(pattern = "xml$",
ignore.case=TRUE,
full.names = FALSE)
# Create function to extract data
RI_ID <-function(FileName) {
doc1 <- xmlParse(FileName)
doc <- xmlToDataFrame(doc1["//ObjectList[@ObjectType='pkg']/o"], )
}
# Convert to dataframe
T1 <- ldply(FileName,RI_ID)
# Rename columns
names(T1)[names(T1) == "a"] <- "UniqueInstallationPackageID"
names(T1)[names(T1) == "b"] <- "PackageVersion_Latest"
# Convert to numeric
FieldToNumeric <- c("UniqueInstallationPackageID", "PackageVersion_Latest")
T1[,FieldToNumeric] <- lapply(T1[,FieldToNumeric], as.numeric)
I would like to (and need some help) to:我想(并且需要一些帮助):
I have reviewed the following two sources, but did not have any success in implementig them:我已经审查了以下两个来源,但在实施它们方面没有任何成功:
Due to a confidentiality agreement, I could not share an example of the xml file, but, if need be, can rename the nodes etc. and submit it.由于保密协议,我无法分享 xml 文件的示例,但如果需要,可以重命名节点等并提交。 Thank you for your help.
谢谢您的帮助。
Simply adjust RI_ID
method to retrieve those two pieces of information (modified date/time with file.info
and FileName
variable) and bind those values into new columns of xml data frame.只需调整
RI_ID
方法以检索这两条信息(使用file.info
和FileName
变量修改日期/时间)并将这些值绑定到 xml 数据帧的新列中。 Below transform()
allows adding columns to a data frame with comma separated assignments:下面的
transform()
允许使用逗号分隔的赋值向数据框中添加列:
# Create function to extract data
RI_ID <-function(FileName) {
doc <- xmlParse(FileName)
df <- transform(xmlToDataFrame(doc["//ObjectList[@ObjectType='pkg']/o"]),
file_name = FileName,
file_modified = file.info(FileName)$mtime)
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.