[英]How do I drop numbers from the start of column names? (preferably through tidyverse)
I'm working through a task where I need to bind a few survey datasets, but unfortunately the survey questions are inconsistently numbered (wording is consistent).我正在完成一项任务,我需要绑定一些调查数据集,但不幸的是,调查问题的编号不一致(措辞一致)。 To solve this, I want to drop the question number from the start of each question.
为了解决这个问题,我想从每个问题的开头删除问题编号。
Currently I am doing this manually with rename()
, but it is time consuming to repeat for every question across each dataset.目前,我正在使用
rename()
手动执行此操作,但对每个数据集中的每个问题重复进行操作非常耗时。 Any tips to do this in a quicker, more efficient way?以更快、更有效的方式执行此操作的任何提示?
Here's an example dataset and my current process:这是一个示例数据集和我当前的流程:
df1 <- data.frame(ID = c(1, 2, 3, 4, 5),
`1. First Question` = c('a', 'b', 'c', 'd', 'e'),
`2. Second Question` = c(1, 1, 3, 0, 1),
`3. Third Question` = c(1, 2, 0, 2, 1),
Year = 2021) %>%
rename(`First Question` = `1. First Question`,
`Second Question` = `2. Second Question`,
`Third Question` = `3. Third Question`)
df2 <- data.frame(ID = c(1, 2, 3, 4, 5),
`1. First Question` = c('a', 'b', 'c', 'd', 'e'),
`2. Third Question` = c(2, 1, 3, 1, 2),
`3. Second Question` = c(2, 2, 1, 3, 2),
Year = 2022) %>%
rename(`First Question` = `1. First Question`,
`Second Question` = `3. Second Question`,
`Third Question` = `2. Third Question`)
end_df <- rbind(df1, df2)
You can use rename_with
, which uses a function, here sub
, to change the column names based on a regex pattern:您可以使用
rename_with
,它使用 function ,这里是sub
,根据正则表达式模式更改列名:
df1 %>%
rename_with(~ sub("^X\\d\\.\\.", "", .))
ID First.Question Second.Question Third.Question Year
1 1 a 1 1 2021
2 2 b 1 2 2021
3 3 c 3 0 2021
4 4 d 0 2 2021
5 5 e 1 1 2021
As noted by @zephryl you can do it in one go for a list
of all your dataframes:正如@zephryl 所指出的,您可以在一个 go 中完成所有数据帧的
list
:
list(df1, df2) %>%
map(rename_with, ~ sub("^X\\d\\.\\.", "", .))
Data:数据:
df1 <- data.frame(ID = c(1, 2, 3, 4, 5),
`1. First Question` = c('a', 'b', 'c', 'd', 'e'),
`2. Second Question` = c(1, 1, 3, 0, 1),
`3. Third Question` = c(1, 2, 0, 2, 1),
Year = 2021)
df2 <- data.frame(ID = c(1, 2, 3, 4, 5),
`1. First Question` = c('a', 'b', 'c', 'd', 'e'),
`2. Third Question` = c(2, 1, 3, 1, 2),
`3. Second Question` = c(2, 2, 1, 3, 2),
Year = 2022)
Using a regex with stringr::str_remove()
inside dplyr::rename_with()
for each dataframe:在
dplyr::rename_with()
中为每个 dataframe 使用带有stringr::str_remove()
的正则表达式:
library(purrr)
library(dplyr)
library(stringr)
list(df1, df2) %>%
map(rename_with, ~ str_remove(.x, "^\\d\\.\\s")) %>%
bind_rows()
# A tibble: 10 × 5
ID `First Question` `Second Question` `Third Question` Year
<dbl> <chr> <dbl> <dbl> <dbl>
1 1 a 1 1 2021
2 2 b 1 2 2021
3 3 c 3 0 2021
4 4 d 0 2 2021
5 5 e 1 1 2021
6 1 a 2 2 2022
7 2 b 2 1 2022
8 3 c 1 3 2022
9 4 d 3 1 2022
10 5 e 2 2 2022
A base R alternative基地 R替代
colnames(df1)[2:4] <- sub("^[0-9]\\. ", "", colnames(df1)[2:4])
colnames(df2)[2:4] <- sub("^[0-9]\\. ", "", colnames(df2)[2:4])
rbind(df1, df2)
ID First Question Second Question Third Question Year
1 1 a 1 1 2021
2 2 b 1 2 2021
3 3 c 3 0 2021
4 4 d 0 2 2021
5 5 e 1 1 2021
6 1 a 2 2 2022
7 2 b 2 1 2022
8 3 c 1 3 2022
9 4 d 3 1 2022
10 5 e 2 2 2022
Important side note.重要的旁注。 Create the data frames with
check.names = F
, otherwise names get substituted with something like this X1..First.Question
etc.使用
check.names = F
创建数据框,否则名称将替换为类似X1..First.Question
等的内容。
df1 <- data.frame(ID = c(1, 2, 3, 4, 5),
`1. First Question` = c('a', 'b', 'c', 'd', 'e'),
`2. Second Question` = c(1, 1, 3, 0, 1),
`3. Third Question` = c(1, 2, 0, 2, 1),
Year = 2021, check.names = F)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.