r 在数据框列表上应用函数

Question

帮助在数据框列表上应用函数。

我不经常使用列表或函数，因此经过 3 小时的搜索和测试后，我需要一些帮助。

我有如下 2 个数据框的列表（实际列表有 40 多个）：

df1 <- structure(list(ID = 1:4, 
    Period = c("C_2021", "C_2021", "C_2021", "C_2021"), 
    subjects = c(2044L, 2044L, 2058L, 2059L), 
    Q_1_A = c(1L, 1L, 4L, 6L), 
    Q_1_B = c(6L, 1L, 6L, NA), 
    col3 = c(4L, 6L, 5L, 2L), 
    col4 = c(3L, 5L, 4L, 4L)), 
    class = "data.frame", row.names = c(NA, -4L))
        
    df2 <- structure(list(ID = 1:4, 
    Period = c("C_2022", "C_2022", "C_2022", "C_2022"), 
    subjects = c(2058L, 2058L, 2065L, 2066L), 
    Q_1_A = c(2L, 5L, 5L, 6L), 
    Q_1_B = c(6L, 1L, 4L, NA), 
    col3 = c(NA, 6L, 5L, 3L), 
    col4 = c(3L, 6L, 5L, 5L)), 
    class = "data.frame", row.names = c(NA, -4L))

数据集的结构如下：

    df1
      ID Period subjects Q_1_A Q_1_B col3 col4
    1  1 C_2021     2044     1     6    4    3
    2  2 C_2021     2044     1     1    6    5
    3  3 C_2021     2058     4     6    5    4
    4  4 C_2021     2059     6    NA    2    4
    
    df2
      ID Period subjects Q_1_A Q_1_B col3 col4
    1  1 C_2022     2058     2     6   NA    3
    2  2 C_2022     2058     5     1    6    6
    3  3 C_2022     2065     5     4    5    5
    4  4 C_2022     2066     6    NA    3    5

df的列表

dflist <- list(df1, df2)

我想做两件事：

1.有条件地删除第二个下划线之前的字符串

我想仅在以“Q”开头的列中删除第二个下划线之前的字符。 列“Q_1_A”将变为“A”。 该代码应该只影响以“Q”开头的列。

注意：ifelse很重要——在真实数据中还有其他2个下划线的列不能修改，而且数据框中的列可能有不同的顺序，所以需要按列名来完成。

#doesnt work (cant seem to get purr working either)
    dflist <- lapply(dflist, function(x) {
      names(x) <- ifelse(starts_with(names(x), "Q"), sub("^[^_]*_", "", names(x)), .x)
      x})

2. 更新列名后，删除列表中存在的列。
注意：在实际数据中，每个 df 中有很多列，列出要保留的列比删除要容易得多。

假设上面的 gsub 已经完成，要保留在 List 下面的列的列表是结构化的。

col_keep <- c("ID", "Period", "subjects", "A", "B")

#doesnt work
dflist <- lapply(dflist, function(x) {
  x[(names(x) %in% col_keep)]
  x})

**UPDATE** I think actually the following will work
dflist <- lapply(dflist, function(x) 
{x <- x %>% select(any_of(col_keep))})
#is the best way to do it?

帮助将不胜感激。

Answer 1

对于第一个需要应用这个

dflist <- lapply(dflist, function(x) {
    names(x) <- ifelse(startsWith(names(x), "Q"), 
    gsub("[Q_0-9]+", "" , names(x)), names(x))
    x})

第二个

col_keep <- c("ID", "Period", "subjects", "A", "B")
dflist <- lapply(dflist, function(x) subset(x , select = col_keep))

Answer 2

在基础 R 中：

lapply(dflist, \(x)setNames(x, sub('^Q([^_]*_){2}', '', names(x)))[col_keep])
[[1]]
  ID Period subjects A  B
1  1 C_2021     2044 1  6
2  2 C_2021     2044 1  1
3  3 C_2021     2058 4  6
4  4 C_2021     2059 6 NA

[[2]]
  ID Period subjects A  B
1  1 C_2022     2058 2  6
2  2 C_2022     2058 5  1
3  3 C_2022     2065 5  4
4  4 C_2022     2066 6 NA

在 tidyverse 中：

library(tidyverse)
dflist %>%
  map(~rename_with(.,~str_remove(.,'([^_]+_){2}'), starts_with('Q'))%>%
        select(all_of(col_keep)))

[[1]]
  ID Period subjects A  B
1  1 C_2021     2044 1  6
2  2 C_2021     2044 1  1
3  3 C_2021     2058 4  6
4  4 C_2021     2059 6 NA

[[2]]
  ID Period subjects A  B
1  1 C_2022     2058 2  6
2  2 C_2022     2058 5  1
3  3 C_2022     2065 5  4
4  4 C_2022     2066 6 NA

Answer 3

使用 base 的另一种解决方案：

# wrap up code for ease of reading
validate_names <- function(df) {

setNames(df, ifelse(grepl("^Q", names(df)), 
         gsub("[Q_0-9]", "", names(df)), names(df)))
}

# lapply to transform list, then subset with character vector
lapply(dflist, validate_names) |> 
lapply(`[`, col_keep)

r 在数据框列表上应用函数

问题描述

3 个解决方案

解决方案1
1 已采纳 2022-06-21 23:43:05

解决方案2
1 2022-06-21 23:50:01

解决方案3
1 2022-06-22 00:01:28

r 在数据框列表上应用函数

问题描述

3 个解决方案

解决方案1 1 已采纳 2022-06-21 23:43:05

解决方案2 1 2022-06-21 23:50:01

解决方案3 1 2022-06-22 00:01:28

解决方案1
1 已采纳 2022-06-21 23:43:05

解决方案2
1 2022-06-21 23:50:01

解决方案3
1 2022-06-22 00:01:28