[英]Transform elements of a list into a table
在使用pdftools软件包阅读R中的pdf之后,我得到一个列表,其中列表的每个元素都具有类似表的结构,并且我希望将保持其表结构的列表中的每个元素聚合到一个数据帧中。
在这里,您具有指向所生成的txt文件的链接: https : //drive.google.com/open? id = 0Bydt25g6hdY- b0NwaDF1NWE0NkU
我已经试过了:
table <- list(0)
for (i in test5) { table <- append(table, i)}
但是我得到了相同的清单。
我希望能够将其作为一个表,其中每一列都是变量,每一行都是观察值,如果可能的话,请删除日期行,以免影响列。
这是dput(table[1:3])
list(" ",
c("\r\n Thu 04/21/2016 ", "\r\n _No Call Type Attached 0 00:00 00:00 00:00 00:00 00:00 0 0% 0% 00:00 00:00\r\n IEX Billing English 12.5% 1 03:17 00:55 00:03 04:15 00:00 2 200% 0% 00:27 00:00 1 100%\r\n IEX VOB English 50.0% 4 03:15 01:29 01:12 05:57 00:00 1 25% 0% 05:56 00:00 4 100%\r\n IEX VOB Spanish 37.5% 3 03:59 00:20 00:28 04:48 00:00 3 100% 0% 00:20 00:00 3 100%\r\n "
), "\r\n")
考虑使用readLines()
扫描文档,然后按空格分割行以迁移到字符列表。 几个Filter()
调用用于删除一个字符和空元素。
file <- "C:\\Path\\To\\Text.txt"
# CONNECT TO FILE, READ LINES
con <- file(description=file, open="r")
pdftext <- readLines(con, warn=FALSE)
close(con)
# FILTER OUT ONE-CHARACTER ELEMENTS
pdftext <- Filter(function(x) nchar(x)>1, pdftext)
# SPLIT LINES BY WHITESPACE / FILTER ONE-CHARACTER ELEMENTS
datalines <- lapply(pdftext, function(x) {
tmp <- strsplit(x, "\\s+")[[1]]
Filter(function(l) nchar(l)>1, tmp)
})
# FILTER EMPTY ELEMENTS
datalines <- Filter(length, datalines)
# FILL IN NAs TO FIT TABLE COLS (USING 16, LARGEST LENGTH)
datalines <- lapply(datalines, function(x) {
if(length(x) < 16) { x <- c(x, rep(NA, 16 - length(x)))
} else {
x
}
})
# BIND ALL LINES INTO CHARACTER MATRIX
datamatrix <- do.call(rbind, datalines)
输出量
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
# [1,] "Thu" "04/21/2016" "Direct" "Internal" "Calls:" "Direct" "External" "Calls:" "Outbound" "Calls:" NA NA
# [2,] "_No" "Call" "Type" "Attached" "00:00" "00:00" "00:00" "00:00" "00:00" "0%" "0%" "00:00"
# [3,] "IEX" "Billing" "English" "12.5%" "03:17" "00:55" "00:03" "04:15" "00:00" "200%" "0%" "00:27"
# [4,] "IEX" "VOB" "English" "50.0%" "03:15" "01:29" "01:12" "05:57" "00:00" "25%" "0%" "05:56"
# [5,] "IEX" "VOB" "Spanish" "37.5%" "03:59" "00:20" "00:28" "04:48" "00:00" "100%" "0%" "00:20"
...
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.