简体   繁体   English

根据向量的第一个字符重新编码R中的变量

[英]Recode variable in R based on first characters of vector

I have a data frame which looks like this: 我有一个数据框架,看起来像这样:

codes <- c('TFAA1', 'TFAA2', 'TFAA3', 'TFAA4', 'TFAB1', 'TFAB2', 'TFAB3', 'TFAB4')
scores <- c(4,3,2,2,4,5,1,2)
example <- data.frame(codes, scores)

I want to create a new column called code_group whereby everything that starts with TFAA gets called "Group1" and everything that starts with TFAB gets called "Group2". 我想创建一个名为code_group的新列,以TFAA开头的所有内容都称为“ Group1”,以TFAB开头的所有内容都称为“ Group2”。

Have been playing with the recode function from the car package, and the grepl function but I'm failing miserably. 一直在使用car包中的recode函数和grepl函数,但是我失败了。 Here's my attempt so far.... 到目前为止,这是我的尝试。

recode <- (codes, "%in% TFAA='Group1'; %in% TFAB='Group2'")

With dplyr and stringr you can get it done: 使用dplyrstringr您可以完成此操作:

library(dplyr)
library(stringr)
example %>% 
  mutate(code_group = case_when(str_detect(codes, "^TFAA") ~ "Group1",
                              str_detect(codes, "^TFAB") ~ "Group2"))

case_when lets you use multiple if-then cases. case_when使您可以使用多个if-then案例。 str_detect lets you, well, detect the pattern you seek in a string. str_detect使您可以很好地检测在字符串中寻找的模式。

example$code_group <- ifelse(startsWith(codes, 'TFAA'), 'Group 1', 
                      ifelse(startsWith(codes, 'TFAB'), 'Group 2',
                             NA))

We could extract the first four characters with substr , convert it to factor and specify the labels as the one we wanted 我们可以使用substr提取前四个字符,将其转换为factor并将labels指定为我们想要的labels

example$code_group <-  with(example,  as.character(factor(substr(codes, 1, 4), 
              levels = c('TFAA', 'TFAB'), labels = c('Group1', 'Group2'))))

We can use split<- : 我们可以使用split<-

example$group <- NA
split(example$group,substr(example$codes,1,4)) <- paste0("Group",1:2)
example
#   codes scores  group
# 1 TFAA1      4 Group1
# 2 TFAA2      3 Group1
# 3 TFAA3      2 Group1
# 4 TFAA4      2 Group1
# 5 TFAB1      4 Group2
# 6 TFAB2      5 Group2
# 7 TFAB3      1 Group2
# 8 TFAB4      2 Group2

Or we can use factors for the same output (3 variants): 或者我们可以将因子用于相同的输出(3个变体):

example$group <- paste0("Group",factor(substr(example$codes,1,4),,1:2))
example$group <- paste0("Group",as.numeric(factor(substr(example$codes,1,4))))
example$group <- factor(substr(example$codes,1,4),,paste0("Group",1:2))

In the last case you get a factor column, in all other cases you get a character column. 在最后一种情况下,您将获得一个因子列,在所有其他情况下,您将获得一个字符列。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM