繁体   English   中英

如何从列名称的开头删除数字? (最好通过 tidyverse)

[英]How do I drop numbers from the start of column names? (preferably through tidyverse)

我正在完成一项任务,我需要绑定一些调查数据集,但不幸的是,调查问题的编号不一致(措辞一致)。 为了解决这个问题,我想从每个问题的开头删除问题编号。

目前,我正在使用rename()手动执行此操作,但对每个数据集中的每个问题重复进行操作非常耗时。 以更快、更有效的方式执行此操作的任何提示?

这是一个示例数据集和我当前的流程:

df1 <- data.frame(ID = c(1, 2, 3, 4, 5),
                  `1. First Question`  = c('a', 'b', 'c', 'd', 'e'),
                  `2. Second Question` = c(1, 1, 3, 0, 1),
                  `3. Third Question`  = c(1, 2, 0, 2, 1),
                   Year = 2021) %>%
       rename(`First Question` = `1. First Question`,
              `Second Question` = `2. Second Question`,
              `Third Question` = `3. Third Question`)

df2 <- data.frame(ID = c(1, 2, 3, 4, 5),
                  `1. First Question`  = c('a', 'b', 'c', 'd', 'e'),
                  `2. Third Question`  = c(2, 1, 3, 1, 2),
                  `3. Second Question` = c(2, 2, 1, 3, 2),
                  Year = 2022) %>% 
       rename(`First Question`  = `1. First Question`,
              `Second Question` = `3. Second Question`,
              `Third Question`  = `2. Third Question`)

end_df <- rbind(df1, df2)

您可以使用rename_with ,它使用 function ,这里是sub ,根据正则表达式模式更改列名:

df1 %>%
    rename_with(~ sub("^X\\d\\.\\.", "", .))
  ID First.Question Second.Question Third.Question Year
1  1              a               1              1 2021
2  2              b               1              2 2021
3  3              c               3              0 2021
4  4              d               0              2 2021
5  5              e               1              1 2021

正如@zephryl 所指出的,您可以在一个 go 中完成所有数据帧的list

list(df1, df2) %>%
  map(rename_with, ~ sub("^X\\d\\.\\.", "", .))

数据:

df1 <- data.frame(ID = c(1, 2, 3, 4, 5),
                  `1. First Question` = c('a', 'b', 'c', 'd', 'e'),
                  `2. Second Question` = c(1, 1, 3, 0, 1),
                  `3. Third Question` = c(1, 2, 0, 2, 1),
                  Year = 2021)

df2 <- data.frame(ID = c(1, 2, 3, 4, 5),
                  `1. First Question` = c('a', 'b', 'c', 'd', 'e'),
                  `2. Third Question` = c(2, 1, 3, 1, 2),
                  `3. Second Question` = c(2, 2, 1, 3, 2),
                  Year = 2022)

dplyr::rename_with()中为每个 dataframe 使用带有stringr::str_remove()的正则表达式:

library(purrr)
library(dplyr)
library(stringr)

list(df1, df2) %>%
  map(rename_with, ~ str_remove(.x, "^\\d\\.\\s")) %>%
  bind_rows()
# A tibble: 10 × 5
      ID `First Question` `Second Question` `Third Question`  Year
   <dbl> <chr>                        <dbl>            <dbl> <dbl>
 1     1 a                                1                1  2021
 2     2 b                                1                2  2021
 3     3 c                                3                0  2021
 4     4 d                                0                2  2021
 5     5 e                                1                1  2021
 6     1 a                                2                2  2022
 7     2 b                                2                1  2022
 8     3 c                                1                3  2022
 9     4 d                                3                1  2022
10     5 e                                2                2  2022

基地 R替代

colnames(df1)[2:4] <- sub("^[0-9]\\. ", "", colnames(df1)[2:4])
colnames(df2)[2:4] <- sub("^[0-9]\\. ", "", colnames(df2)[2:4])

rbind(df1, df2)
   ID First Question Second Question Third Question Year
1   1              a               1              1 2021
2   2              b               1              2 2021
3   3              c               3              0 2021
4   4              d               0              2 2021
5   5              e               1              1 2021
6   1              a               2              2 2022
7   2              b               2              1 2022
8   3              c               1              3 2022
9   4              d               3              1 2022
10  5              e               2              2 2022

重要的旁注。 使用check.names = F创建数据框,否则名称将替换为类似X1..First.Question等的内容。

df1 <- data.frame(ID = c(1, 2, 3, 4, 5),
                  `1. First Question` = c('a', 'b', 'c', 'd', 'e'),
                  `2. Second Question` = c(1, 1, 3, 0, 1),
                  `3. Third Question` = c(1, 2, 0, 2, 1),
                  Year = 2021, check.names = F)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM