简体   繁体   English

使用pdf提取器循环并添加到R中的数据帧

[英]Looping and adding to a dataframe in R with a pdf extractor

I want to read in a pdf via pdf tools, extract some data from it and write it to a csv. 我想通过pdf工具阅读pdf,从中提取一些数据并将其写入csv。 I have been able to do this successfully for one pdf, but I have many (440) to do. 我已经能够成功地为一个pdf执行此操作,但是我有很多(440)可以执行。 I'm trying to write a loop that goes through a list I have created that has all my file paths in it. 我正在尝试编写一个循环,遍历我创建的列表,该列表中包含所有文件路径。 The problem is it overwrites every time. 问题是它每次都会覆盖。 So I think my program is doing what I've asked of it, but I am not asking the correct thing! 因此,我认为我的程序正在执行我所要求的操作,但是我没有问正确的事情! My code is below: 我的代码如下:

temp <-as.list(list.files(pattern = "*.pdf"))

file_path <- file.path(getwd(),temp)%>%as.list()

data_anz<-as.character()

for (i in 1:length(file_path)){
  data_anz<-pdf_text(file_path[[i]])[2]%>%str_split("\n")%>%.[[1]]%>%str_split_fixed("\\s{2,}", n=4)%>%as.data.frame(i:length(file_path))%>%rename(Commodity =V1, Level = V2, Change = V3, Description = V4)

}

What I would like achieve is a data frame that adds to with every iteration, not over writes. 我想要实现的是一个在每次迭代时都添加的数据帧,而不是过度写入。 So first run, the df = 1 row, 4 cols, the next run 2 rows ect. 因此,第一次运行,df = 1行,4列,下一次运行2行,以此类推。

I'm lost! 我迷路了! But I can get it to work for an individual member of the list and it seems to work through the whole list, but overwrites. 但是我可以使它适用于列表的单个成员,并且似乎可以遍历整个列表,但是会被覆盖。

Any help would be super appreciated! 任何帮助将不胜感激!

Each iteration of the loop is assigning your table to the same variable. 循环的每次迭代都将表分配给相同的变量。 You might want to try something like 您可能想尝试类似

data_anz<-list()

for (i in 1:length(file_path)){
data_anz[[i]] <- ...
}
data_anz_all <- do.call(data_anz, rbind)

which puts each table into its own position in a list, and then row-binds them all together at the end (assuming the columns of the individual frames are compatible). 这会将每个表格放在列表中自己的位置,然后在最后将所有表格行绑定在一起(假设各个框架的列兼容)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM