[英]Creating new column based on character values in R
I have a data frame with a column called 'full_name' that presents two teams, for example: • 'Man U to win Liverpool to win' • 'Liverpool to win Man U to win' • 'Chelsea to win Arsenal to win' And so on…我有一个数据框,其中包含一个名为“full_name”的列,该列显示两支球队,例如: • 'Man U to win Liverpool to win' • 'Liverpool to win Man U to win' • 'Chelsea to win Arsenal to win'和很快…
I would like to be able to differentiate the teams into North and South, so that if 'Man U to win Liverpool to win' or 'Liverpool to win Man U to win' are presented, then this is coded as 'North', whereas if 'Chelsea to win Arsenal to win' is presented, this is coded as 'South', and so on.我希望能够将球队区分为北方和南方,这样如果出现“Man U to win Liverpool to win”或“Liverpool to win Man U to win”,则编码为“North”,而如果出现“Chelsea to win Arsenal to win”,则编码为“South”,依此类推。
levels(raw_data$full_name)[levels(raw_data$full_name)== "Man U to win Liverpool to win"] <- 'North'
levels(raw_data$full_name)[levels(raw_data$full_name)== "Liverpool to win Man U to win"] <- 'North'
levels(raw_data$full_name)[levels(raw_data$full_name)== "Chelsea to win Arsenal to win"] <- 'South'
The code above does not produce any error, however the dataframe remains unchanged, and there is not producing the desired output. Is a way to do this?上面的代码不会产生任何错误,但是 dataframe 保持不变,并且没有产生所需的 output。有办法做到这一点吗?
Here an example with a tidyverse approach that might help you这是一个可能对您有帮助的tidyverse方法示例
library(dplyr)
north <- c("Man U to win Liverpool to win","Liverpool to win Man U to win")
south <- c("Chelsea to win Arsenal to win")
df <-
data.frame(full_name = sample(c(north,south),size = 5,replace = TRUE))
df %>%
mutate(region = case_when(
full_name %in% north ~ "North",
full_name %in% south ~ "South"
))
full_name region
1 Chelsea to win Arsenal to win South
2 Man U to win Liverpool to win North
3 Chelsea to win Arsenal to win South
4 Man U to win Liverpool to win North
5 Man U to win Liverpool to win North
Here is an option with fct_recode
这是fct_recode
的一个选项
library(forcats)
raw_data$full_name <- with(raw_data, fct_recode(full_name,
North = "Man U to win Liverpool to win",
North = "Liverpool to win Man U to win",
South = "Chelsea to win Arsenal to win"))
Or using base R
或者使用base R
factor(raw_data$full_name, levels = c("Chelsea to win Arsenal to win",
"Liverpool to win Man U to win", "Man U to win Liverpool to win"
), labels = c("South", "North", "North"))
Or if we want to use levels
或者如果我们想使用levels
lvls_to_change <- c("Man U to win Liverpool to win",
"Liverpool to win Man U to win", "Chelsea to win Arsenal to win")
lvsl_new <- c("North", "North", "South")
i1 <- levels(raw_data$full_name) %in% lvls_to_change
levels(raw_data$full_name)[i1] <- lvsl_new[match(levels(raw_data$full_name)[i1], lvls_to_change)]
raw_data <- structure(list(full_name = structure(c(2L, 2L, 3L, 2L,
1L), levels = c("Chelsea to win Arsenal to win",
"Liverpool to win Man U to win", "Man U to win Liverpool to win"
), class = "factor")), row.names = c(NA, -5L), class = "data.frame")
Here is an alternative approach:这是另一种方法:
library(dplyr)
library(stringr)
north <- c("Liverpool|Man")
south <- c("Chelsea|Arsenal")
df %>%
mutate(region = case_when(str_detect(full_name, north) ~ "North",
str_detect(full_name, south) ~ "South",
TRUE ~ NA_character_))
full_name region
1 Liverpool to win Man U to win North
2 Chelsea to win Arsenal to win South
3 Man U to win Liverpool to win North
4 Chelsea to win Arsenal to win South
5 Liverpool to win Man U to win North
In base R, your code will work as intended if you remove the levels()
calls.在基数 R 中,如果您删除levels()
调用,您的代码将按预期工作。 You can call factor()
after replacing values if you want the column to be a factor.如果您希望该列成为一个因素,您可以在替换值后调用factor()
。
# example data
raw_data <- data.frame(full_name = c(
"Man U to win Liverpool to win",
"Liverpool to win Man U to win",
"Chelsea to win Arsenal to win"
))
raw_data$full_name[raw_data$full_name == "Man U to win Liverpool to win"] <- "North"
raw_data$full_name[raw_data$full_name == "Liverpool to win Man U to win"] <- "North"
raw_data$full_name[raw_data$full_name == "Chelsea to win Arsenal to win"] <- "South"
raw_data$full_name <- factor(raw_data$full_name)
Alternatively, you can use a named vector as a lookup table:或者,您可以使用命名向量作为查找表:
lookup <- c(
"Man U to win Liverpool to win" = "North",
"Liverpool to win Man U to win" = "North",
"Chelsea to win Arsenal to win" = "South"
)
raw_data$full_name <- factor(lookup[raw_data$full_name])
Result from either approach:两种方法的结果:
#> raw_data
full_name
1 North
2 North
3 South
#> levels(raw_data$full_name)
[1] "North" "South"
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.