简体   繁体   中英

Recode data values in a dataframe into combined values in R

Im trying to compare marital status and my variables have names of "married", "not married", "engaged", "single", and "nota married". How would I make this data only read as "married" and "not married"? (engaged counting as married, single and nota single counting as not married)

Sample dataset

data.frame(mstatus = sample(x = c("married", 
                                  "not married", 
                                  "engaged", 
                                  "single", 
                                  "not married"), 
                            size = 15, replace = TRUE))

This is what I have so far

df2 <- df%>%mutate(
  mstatus = (tolower(mstatus))
)

You can use the mutate() function from dplyr (tidyverse packge):

df <- df %>% dplyr::mutate(mstatus = case_when(
    mstatus == "married" | mstatus == "engaged"  ~ "married",
    mstatus == "not married" | mstatus == "single" ~ "not married"
))

I guess the simplest, base R, way is by using an ifelse statement:

df2$mstatus_new <- ifelse(df2$mstatus=="engaged"|df2$mstatus=="married", "married", "not married")

Data:

df2 <- data.frame(
  mstatus = c("married", "not married", "engaged", "single", "nota married"))
df2
       mstatus
1      married
2  not married
3      engaged
4       single
5 nota married

Result:

df2
       mstatus mstatus_new
1      married     married
2  not married not married
3      engaged     married
4       single not married
5 nota married not married

If we need to recode the 'mstatus, one option is forcats

library(dplyr)
library(forcats)
df2 %>%
      mutate(mstatus = fct_recode(mstatus, married = "engaged",
         `not married` = "single"))
#      mstatus
#1     married
#2 not married
#3     married
#4 not married
#5 not married

Or if there are many values to change, use fct_collapse which can take a vector of values

df2 %>%
   mutate(mstatus = fct_collapse(mstatus, married = c('engaged'), 
         `not married` = c("single")))

data

df2 <- structure(list(mstatus = structure(c(2L, 3L, 1L, 4L, 3L), .Label = c("engaged", 
"married", "not married", "single"), class = "factor")),
class = "data.frame", row.names = c(NA, 
-5L))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM