[英]r apply functions over list of data frames
Help with applying functions over a list of data frames.帮助在数据框列表上应用函数。
I don't often work with lists or functions so following a 3 hour search and test I need some assistance.我不经常使用列表或函数,因此经过 3 小时的搜索和测试后,我需要一些帮助。
I have a list of 2 data frames as follows (real list has 40+):我有如下 2 个数据框的列表(实际列表有 40 多个):
df1 <- structure(list(ID = 1:4,
Period = c("C_2021", "C_2021", "C_2021", "C_2021"),
subjects = c(2044L, 2044L, 2058L, 2059L),
Q_1_A = c(1L, 1L, 4L, 6L),
Q_1_B = c(6L, 1L, 6L, NA),
col3 = c(4L, 6L, 5L, 2L),
col4 = c(3L, 5L, 4L, 4L)),
class = "data.frame", row.names = c(NA, -4L))
df2 <- structure(list(ID = 1:4,
Period = c("C_2022", "C_2022", "C_2022", "C_2022"),
subjects = c(2058L, 2058L, 2065L, 2066L),
Q_1_A = c(2L, 5L, 5L, 6L),
Q_1_B = c(6L, 1L, 4L, NA),
col3 = c(NA, 6L, 5L, 3L),
col4 = c(3L, 6L, 5L, 5L)),
class = "data.frame", row.names = c(NA, -4L))
The structure of the datasets are as follows:数据集的结构如下:
df1
ID Period subjects Q_1_A Q_1_B col3 col4
1 1 C_2021 2044 1 6 4 3
2 2 C_2021 2044 1 1 6 5
3 3 C_2021 2058 4 6 5 4
4 4 C_2021 2059 6 NA 2 4
df2
ID Period subjects Q_1_A Q_1_B col3 col4
1 1 C_2022 2058 2 6 NA 3
2 2 C_2022 2058 5 1 6 6
3 3 C_2022 2065 5 4 5 5
4 4 C_2022 2066 6 NA 3 5
The list of df's df的列表
dflist <- list(df1, df2)
I would like to do 2 things:我想做两件事:
1. Conditional removal of string before 2nd underscore 1.有条件地删除第二个下划线之前的字符串
I would like to remove characters before the 2nd underscore only in columns beginning with "Q".我想仅在以“Q”开头的列中删除第二个下划线之前的字符。 Column "Q_1_A" would become "A".列“Q_1_A”将变为“A”。 The code should only impact columns starting with "Q".该代码应该只影响以“Q”开头的列。
Note: The ifelse is important - in the real data there are other columns with 2 underscores that cannot be modified, and the columns in data frames may be in different orders so it needs to be done by column name.注意:ifelse很重要——在真实数据中还有其他2个下划线的列不能修改,而且数据框中的列可能有不同的顺序,所以需要按列名来完成。
#doesnt work (cant seem to get purr working either)
dflist <- lapply(dflist, function(x) {
names(x) <- ifelse(starts_with(names(x), "Q"), sub("^[^_]*_", "", names(x)), .x)
x})
2. Once column names are updated, remove columns present on a list. 2. 更新列名后,删除列表中存在的列。
Note: In the real data there are a lot of columns in each df, it's much easier to list the columns to keep rather than remove.注意:在实际数据中,每个 df 中有很多列,列出要保留的列比删除要容易得多。
List of columns to keep below List is structured assuming the gsub above has been complete.假设上面的 gsub 已经完成,要保留在 List 下面的列的列表是结构化的。
col_keep <- c("ID", "Period", "subjects", "A", "B")
#doesnt work
dflist <- lapply(dflist, function(x) {
x[(names(x) %in% col_keep)]
x})
**UPDATE** I think actually the following will work
dflist <- lapply(dflist, function(x)
{x <- x %>% select(any_of(col_keep))})
#is the best way to do it?
Help would be greatly appreciated.帮助将不胜感激。
For the first required apply this对于第一个需要应用这个
dflist <- lapply(dflist, function(x) {
names(x) <- ifelse(startsWith(names(x), "Q"),
gsub("[Q_0-9]+", "" , names(x)), names(x))
x})
and the second第二个
col_keep <- c("ID", "Period", "subjects", "A", "B")
dflist <- lapply(dflist, function(x) subset(x , select = col_keep))
In base R:在基础 R 中:
lapply(dflist, \(x)setNames(x, sub('^Q([^_]*_){2}', '', names(x)))[col_keep])
[[1]]
ID Period subjects A B
1 1 C_2021 2044 1 6
2 2 C_2021 2044 1 1
3 3 C_2021 2058 4 6
4 4 C_2021 2059 6 NA
[[2]]
ID Period subjects A B
1 1 C_2022 2058 2 6
2 2 C_2022 2058 5 1
3 3 C_2022 2065 5 4
4 4 C_2022 2066 6 NA
in tidyverse:在 tidyverse 中:
library(tidyverse)
dflist %>%
map(~rename_with(.,~str_remove(.,'([^_]+_){2}'), starts_with('Q'))%>%
select(all_of(col_keep)))
[[1]]
ID Period subjects A B
1 1 C_2021 2044 1 6
2 2 C_2021 2044 1 1
3 3 C_2021 2058 4 6
4 4 C_2021 2059 6 NA
[[2]]
ID Period subjects A B
1 1 C_2022 2058 2 6
2 2 C_2022 2058 5 1
3 3 C_2022 2065 5 4
4 4 C_2022 2066 6 NA
Another solutions using base:使用 base 的另一种解决方案:
# wrap up code for ease of reading
validate_names <- function(df) {
setNames(df, ifelse(grepl("^Q", names(df)),
gsub("[Q_0-9]", "", names(df)), names(df)))
}
# lapply to transform list, then subset with character vector
lapply(dflist, validate_names) |>
lapply(`[`, col_keep)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.