[英]R - apply function on two files in folders with for loop or lapply and save results in one dataframe
我在“data”中有一个包含 20 个文件夹的数据集,它们的结构是相同的。 文件夹级别的唯一区别是它们的名称(从“1”到“20”)。 请看下面的模式。 这些文件始终具有相同的文件名和相同的列结构。 文件夹之间的.csv
文件中的列长度可能存在差异,但同一文件夹中的.csv
文件之间可能存在差异。 数据框中没有缺失值。 我想处理文件中的“mean”列。
data
- 1 (folder)
- alpha (file)
- mean (column)
- .... (more columns)
- beta (file)
- mean (column)
- .... (more columns)
- ... (more files)
- 2 (folder)
- alpha (file)
- mean (column)
- .... (more columns)
- beta (file)
- mean (column)
- .... (more columns)
- ... (more files)
- ... (more folders with the same structure)
我想在一个文件夹中比较 alpha 的平均值和 beta 的平均值。 然而,最后,我想要一个数据框,它是所有单个文件夹的所有结果的子集。 所以我可以从这个数据框中创建分面箱线图和描述性统计数据。
我还是 R 的新手,显然缺乏它的技能(也很抱歉复杂的代码和我的英语)。 我可以手动为每个文件夹执行任务,但我无法将结果与 for 循环或 lapply 解决方案放在一起。
我发现许多线程需要合并数据帧,而无需事先从同一文件夹中的两个文件中执行函数。 我确实希望我生成了一个可行的最小示例,每个示例包含 2 个文件夹中的 2 个数据框。
library(plyr)
library(tidyverse)
alpha1 <- read_csv('data/1/alpha.csv')
beta1 <- read_csv('data/1/beta.csv')
alpha2 <- read_csv('data/2/alpha2.csv')
beta2 <- read_csv('data/2/beta2.csv')
alpha1 <- structure(list(Name = c("A", "B", "C", "D", "E", "F", "G", "H",
"I", "J", "K"), mean = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11)), class = c("spec_tbl_df", "tbl_df", "tbl",
"data.frame"), row.names = c(NA, -11L), spec = structure(list(
cols = list(Name = structure(list(), class = c("collector_character",
"collector")), mean = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1), class = "col_spec"))
beta1 <- structure(list(Name = c("A", "B", "C", "D", "E", "F", "G", "H",
"I", "J", "K"), mean = c(2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)), class = c("spec_tbl_df", "tbl_df", "tbl",
"data.frame"), row.names = c(NA, -11L), spec = structure(list(
cols = list(Name = structure(list(), class = c("collector_character",
"collector")), mean = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1), class = "col_spec"))
alpha_mean <- alpha1 %>% select(mean_alpha = mean)
alphabeta <- alpha_mean %>% add_column(mean_beta = beta1$mean)
alphabeta_table <- ddply(alphabeta, .(), transform, alphabeta = (mean_alpha/mean_beta))
alphabeta_table
.id mean_alpha mean_beta alphabeta
1 <NA> 1 2 0.5000000
2 <NA> 2 3 0.6666667
3 <NA> 3 4 0.7500000
4 <NA> 4 5 0.8000000
5 <NA> 5 6 0.8333333
6 <NA> 6 7 0.8571429
7 <NA> 7 8 0.8750000
8 <NA> 8 9 0.8888889
9 <NA> 9 10 0.9000000
10 <NA> 10 11 0.9090909
11 <NA> 11 12 0.9166667
alpha2 <- structure(list(Name = c("A", "B", "C", "D", "E", "F", "G", "H",
"I", "J", "K", "L", "M"), mean = c(2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)), class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -13L), spec = structure(list(
cols = list(Name = structure(list(), class = c("collector_character",
"collector")), mean = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1), class = "col_spec"))
beta2 <- structure(list(Name = c("A", "B", "C", "D", "E", "F", "G", "H",
"I", "J", "K", "L", "M"), mean = c(3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)), class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -13L), spec = structure(list(
cols = list(Name = structure(list(), class = c("collector_character",
"collector")), mean = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1), class = "col_spec"))
alpha2_mean <- alpha2 %>% select(mean_alpha = mean)
alphabeta2 <- alpha2_mean %>% add_column(mean_beta = beta2$mean)
alphabeta2_table <- ddply(alphabeta2, .(), transform, alphabeta = (mean_alpha/ mean_beta))
alphabeta2_table
.id mean_alpha mean_beta alphabeta
1 <NA> 2 3 0.6666667
2 <NA> 3 4 0.7500000
3 <NA> 4 5 0.8000000
4 <NA> 5 6 0.8333333
5 <NA> 6 7 0.8571429
6 <NA> 7 8 0.8750000
7 <NA> 8 9 0.8888889
8 <NA> 9 10 0.9000000
9 <NA> 10 11 0.9090909
10 <NA> 11 12 0.9166667
11 <NA> 12 13 0.9230769
12 <NA> 13 14 0.9285714
13 <NA> 14 15 0.9333333
我想要的输出是:
.id mean_alpha mean_beta alphabeta
1 1 1 2 0.5000000
2 1 2 3 0.6666667
3 1 3 4 0.7500000
4 1 4 5 0.8000000
5 1 5 6 0.8333333
6 1 6 7 0.8571429
7 1 7 8 0.8750000
8 1 8 9 0.8888889
9 1 9 10 0.9000000
10 1 10 11 0.9090909
11 1 11 12 0.9166667
1 2 2 3 0.6666667
2 2 3 4 0.7500000
3 2 4 5 0.8000000
4 2 5 6 0.8333333
5 2 6 7 0.8571429
6 2 7 8 0.8750000
7 2 8 9 0.8888889
8 2 9 10 0.9000000
9 2 10 11 0.9090909
10 2 11 12 0.9166667
11 2 12 13 0.9230769
12 2 13 14 0.9285714
13 2 14 15 0.9333333
1 3 ... ... ...
2 3 ... ... ...
...
感谢您的任何帮助!
试试这个解决方案:
使用list.dirs
获取所有文件夹。
对于每个文件夹,读取“alpha”和“beta”文件并返回一个带有alpha
、 beta
和alphabeta
值的3列小标题。
使用id
列绑定所有数据框以了解每个值来自哪个文件夹。
all_folders <- list.dirs('Data/', recursive = FALSE, full.names = TRUE)
result <- purrr::map_df(all_folders, function(x) {
all_Files <- list.files(x, full.names = TRUE, pattern = 'alpha|beta')
df1 <- read.csv(all_Files[1])
df2 <- read.csv(all_Files[2])
tibble::tibble(alpha = df1$mean, beta = df2$mean, alphabeta = alpha/beta)
}, .id = "id")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.