簡體   English   中英

如何從列名稱的開頭刪除數字? (最好通過 tidyverse)

[英]How do I drop numbers from the start of column names? (preferably through tidyverse)

我正在完成一項任務,我需要綁定一些調查數據集,但不幸的是,調查問題的編號不一致(措辭一致)。 為了解決這個問題,我想從每個問題的開頭刪除問題編號。

目前,我正在使用rename()手動執行此操作,但對每個數據集中的每個問題重復進行操作非常耗時。 以更快、更有效的方式執行此操作的任何提示?

這是一個示例數據集和我當前的流程:

df1 <- data.frame(ID = c(1, 2, 3, 4, 5),
                  `1. First Question`  = c('a', 'b', 'c', 'd', 'e'),
                  `2. Second Question` = c(1, 1, 3, 0, 1),
                  `3. Third Question`  = c(1, 2, 0, 2, 1),
                   Year = 2021) %>%
       rename(`First Question` = `1. First Question`,
              `Second Question` = `2. Second Question`,
              `Third Question` = `3. Third Question`)

df2 <- data.frame(ID = c(1, 2, 3, 4, 5),
                  `1. First Question`  = c('a', 'b', 'c', 'd', 'e'),
                  `2. Third Question`  = c(2, 1, 3, 1, 2),
                  `3. Second Question` = c(2, 2, 1, 3, 2),
                  Year = 2022) %>% 
       rename(`First Question`  = `1. First Question`,
              `Second Question` = `3. Second Question`,
              `Third Question`  = `2. Third Question`)

end_df <- rbind(df1, df2)

您可以使用rename_with ,它使用 function ,這里是sub ,根據正則表達式模式更改列名:

df1 %>%
    rename_with(~ sub("^X\\d\\.\\.", "", .))
  ID First.Question Second.Question Third.Question Year
1  1              a               1              1 2021
2  2              b               1              2 2021
3  3              c               3              0 2021
4  4              d               0              2 2021
5  5              e               1              1 2021

正如@zephryl 所指出的,您可以在一個 go 中完成所有數據幀的list

list(df1, df2) %>%
  map(rename_with, ~ sub("^X\\d\\.\\.", "", .))

數據:

df1 <- data.frame(ID = c(1, 2, 3, 4, 5),
                  `1. First Question` = c('a', 'b', 'c', 'd', 'e'),
                  `2. Second Question` = c(1, 1, 3, 0, 1),
                  `3. Third Question` = c(1, 2, 0, 2, 1),
                  Year = 2021)

df2 <- data.frame(ID = c(1, 2, 3, 4, 5),
                  `1. First Question` = c('a', 'b', 'c', 'd', 'e'),
                  `2. Third Question` = c(2, 1, 3, 1, 2),
                  `3. Second Question` = c(2, 2, 1, 3, 2),
                  Year = 2022)

dplyr::rename_with()中為每個 dataframe 使用帶有stringr::str_remove()的正則表達式:

library(purrr)
library(dplyr)
library(stringr)

list(df1, df2) %>%
  map(rename_with, ~ str_remove(.x, "^\\d\\.\\s")) %>%
  bind_rows()
# A tibble: 10 × 5
      ID `First Question` `Second Question` `Third Question`  Year
   <dbl> <chr>                        <dbl>            <dbl> <dbl>
 1     1 a                                1                1  2021
 2     2 b                                1                2  2021
 3     3 c                                3                0  2021
 4     4 d                                0                2  2021
 5     5 e                                1                1  2021
 6     1 a                                2                2  2022
 7     2 b                                2                1  2022
 8     3 c                                1                3  2022
 9     4 d                                3                1  2022
10     5 e                                2                2  2022

基地 R替代

colnames(df1)[2:4] <- sub("^[0-9]\\. ", "", colnames(df1)[2:4])
colnames(df2)[2:4] <- sub("^[0-9]\\. ", "", colnames(df2)[2:4])

rbind(df1, df2)
   ID First Question Second Question Third Question Year
1   1              a               1              1 2021
2   2              b               1              2 2021
3   3              c               3              0 2021
4   4              d               0              2 2021
5   5              e               1              1 2021
6   1              a               2              2 2022
7   2              b               2              1 2022
8   3              c               1              3 2022
9   4              d               3              1 2022
10  5              e               2              2 2022

重要的旁注。 使用check.names = F創建數據框,否則名稱將替換為類似X1..First.Question等的內容。

df1 <- data.frame(ID = c(1, 2, 3, 4, 5),
                  `1. First Question` = c('a', 'b', 'c', 'd', 'e'),
                  `2. Second Question` = c(1, 1, 3, 0, 1),
                  `3. Third Question` = c(1, 2, 0, 2, 1),
                  Year = 2021, check.names = F)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM