简体   繁体   English

根据 R 中的字符值创建新列

[英]Creating new column based on character values in R

I have a data frame with a column called 'full_name' that presents two teams, for example: • 'Man U to win Liverpool to win' • 'Liverpool to win Man U to win' • 'Chelsea to win Arsenal to win' And so on…我有一个数据框,其中包含一个名为“full_name”的列,该列显示两支球队,例如: • 'Man U to win Liverpool to win' • 'Liverpool to win Man U to win' • 'Chelsea to win Arsenal to win'和很快…

I would like to be able to differentiate the teams into North and South, so that if 'Man U to win Liverpool to win' or 'Liverpool to win Man U to win' are presented, then this is coded as 'North', whereas if 'Chelsea to win Arsenal to win' is presented, this is coded as 'South', and so on.我希望能够将球队区分为北方和南方,这样如果出现“Man U to win Liverpool to win”或“Liverpool to win Man U to win”,则编码为“North”,而如果出现“Chelsea to win Arsenal to win”,则编码为“South”,依此类推。

levels(raw_data$full_name)[levels(raw_data$full_name)== "Man U to win Liverpool to win"] <- 'North'
levels(raw_data$full_name)[levels(raw_data$full_name)== "Liverpool to win Man U to win"] <- 'North'
levels(raw_data$full_name)[levels(raw_data$full_name)== "Chelsea to win Arsenal to win"] <- 'South'

The code above does not produce any error, however the dataframe remains unchanged, and there is not producing the desired output. Is a way to do this?上面的代码不会产生任何错误,但是 dataframe 保持不变,并且没有产生所需的 output。有办法做到这一点吗?

Here an example with a tidyverse approach that might help you这是一个可能对您有帮助的tidyverse方法示例

library(dplyr)

north <- c("Man U to win Liverpool to win","Liverpool to win Man U to win")
south <- c("Chelsea to win Arsenal to win")


df <- 
  data.frame(full_name = sample(c(north,south),size = 5,replace = TRUE))
             
df %>% 
  mutate(region = case_when(
    full_name %in% north ~ "North",
    full_name %in% south ~ "South"
  ))

                      full_name region
1 Chelsea to win Arsenal to win  South
2 Man U to win Liverpool to win  North
3 Chelsea to win Arsenal to win  South
4 Man U to win Liverpool to win  North
5 Man U to win Liverpool to win  North

Here is an option with fct_recode这是fct_recode的一个选项

library(forcats)
raw_data$full_name <- with(raw_data, fct_recode(full_name, 
   North =  "Man U to win Liverpool to win",
   North = "Liverpool to win Man U to win",
   South  =  "Chelsea to win Arsenal to win"))

Or using base R或者使用base R

factor(raw_data$full_name, levels = c("Chelsea to win Arsenal to win", 
"Liverpool to win Man U to win", "Man U to win Liverpool to win"
), labels = c("South", "North", "North"))

Or if we want to use levels或者如果我们想使用levels

lvls_to_change <-  c("Man U to win Liverpool to win",
   "Liverpool to win Man U to win", "Chelsea to win Arsenal to win")
lvsl_new <- c("North", "North", "South")
i1 <- levels(raw_data$full_name) %in% lvls_to_change
levels(raw_data$full_name)[i1] <- lvsl_new[match(levels(raw_data$full_name)[i1], lvls_to_change)]

data数据

raw_data <- structure(list(full_name = structure(c(2L, 2L, 3L, 2L,
 1L), levels = c("Chelsea to win Arsenal to win", 
"Liverpool to win Man U to win", "Man U to win Liverpool to win"
), class = "factor")), row.names = c(NA, -5L), class = "data.frame")

Here is an alternative approach:这是另一种方法:

library(dplyr)
library(stringr)

north <- c("Liverpool|Man")
south <- c("Chelsea|Arsenal")

df %>% 
  mutate(region = case_when(str_detect(full_name, north) ~ "North",
                            str_detect(full_name, south) ~ "South",
                            TRUE ~ NA_character_))
                      full_name region
1 Liverpool to win Man U to win  North
2 Chelsea to win Arsenal to win  South
3 Man U to win Liverpool to win  North
4 Chelsea to win Arsenal to win  South
5 Liverpool to win Man U to win  North

In base R, your code will work as intended if you remove the levels() calls.在基数 R 中,如果您删除levels()调用,您的代码将按预期工作。 You can call factor() after replacing values if you want the column to be a factor.如果您希望该列成为一个因素,您可以在替换值后调用factor()

# example data
raw_data <- data.frame(full_name = c(
  "Man U to win Liverpool to win", 
  "Liverpool to win Man U to win",
  "Chelsea to win Arsenal to win"
))

raw_data$full_name[raw_data$full_name == "Man U to win Liverpool to win"] <- "North"
raw_data$full_name[raw_data$full_name == "Liverpool to win Man U to win"] <- "North"
raw_data$full_name[raw_data$full_name == "Chelsea to win Arsenal to win"] <- "South"

raw_data$full_name <- factor(raw_data$full_name)

Alternatively, you can use a named vector as a lookup table:或者,您可以使用命名向量作为查找表:

lookup <- c(
  "Man U to win Liverpool to win" = "North",
  "Liverpool to win Man U to win" = "North",
  "Chelsea to win Arsenal to win" = "South"
)

raw_data$full_name <- factor(lookup[raw_data$full_name])

Result from either approach:两种方法的结果:

#> raw_data
  full_name
1     North
2     North
3     South

#> levels(raw_data$full_name)
[1] "North" "South"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM