简体   繁体   中英

Make a new variable with selected levels of another variable

I'm having trouble with creating a new variable with selected levels of another variable. The data set is gss and the variable is class which has 5 levels "Lower Class" "Working Class" "Middle Class" "Upper Class" "No Class" and NA

If I run,

gss %>% 
select(class) %>%
str()

It gives me

'data.frame':   57061 obs. of  1 variable:
$ class: Factor w/ 5 levels "Lower Class",..: 3 3 2 3 2 3 3 2 2 2 ...

Since I am only interested in those who specified their economic class, I would like to take out "No Class" level and NA. I do not know any better way to do this so I did

gss <- gss %>%
mutate(filteredclass = ifelse(class == "Lower Class", "Lower Class", 
ifelse(class == "Working Class", "Working Class", ifelse(class == "Middle 
Class", "Middle Class", ifelse(class == "Upper Class", "Upper Class", NA)))))

Then, I tried to see whether it worked or not, so I ran:

with (gss, table(filteredclass))

Which then gave me with mixed order as below:

filteredclass
Lower Class  Middle Class   Upper Class Working Class 
     3147         24289          1741         24458

I would want the new variable filteredclass to be shown as the same order as the variable 'class'. Since if I do the same with the variable 'class' it gives me:

with (gss, table(class))
class
Lower Class Working Class  Middle Class   Upper Class 
     3147         24458         24289          1741 
 No Class 
        1 

Is there any way I can fix this? Or also, is there any way I can take out No Class level without going through mutate command I did above?

Thanks for your help in advance!

In the future, its much easier if you provide a reproducible example .

If you want to get rid of "No Class" you can use filter

gss <- gss %>% 
  filter(class != "No Class") %>%
  droplevels()

To remove NAs just use

gss <- na.omit(gss)

Easiest way could to be factor on class as:

gss$filteredclass <- factor(gss$class, c("Lower Class", "Working Class",
                             "Middle Class", "Upper Class"))

This will omit "No class" and set it as NA .

You have to relevel the factor with the same order as gss$class . To do this you can add another line to your mutate() statement where you create the factor with the same levels and drop unused levels (No Class).

library(tidyverse)
# Generate the data you showed
gss <- data.frame(class = factor(sample(c("Lower Class",  "Working Class",  "Middle Class",    "Upper Class", NA, "No Class"), 
                                        45000, replace = TRUE))) %>%
  mutate(class = factor(class, levels = c("Lower Class",  "Working Class",  "Middle Class",    "Upper Class", "No Class", NA)))

# Sampled data
with(gss, table(class, useNA = "always"))

# Mutate gss the way you did it
gss <-  gss %>%
  mutate(filteredclass = ifelse(class == "Lower Class", "Lower Class", 
                                ifelse(class == "Working Class", "Working Class",
                                       ifelse(class == "Middle Class", "Middle Class", 
                                              ifelse(class == "Upper Class", "Upper Class", NA)))),
         # Then make filteredclass into a factor with the same levels as class
         # Use droplevels() to remove unused classes (since we removed the No Class)
         filteredclass = droplevels(factor(filteredclass, levels = levels(class))))

with(gss, table(class))
with(gss, table(filteredclass))

The output is this,

> with(gss, table(class, useNA = "always"))
class
  Lower Class Working Class  Middle Class   Upper Class      No Class 
         7362          7469          7626          7450          7457 
         <NA> 
         7636 

> with(gss, table(class))
class
  Lower Class Working Class  Middle Class   Upper Class      No Class 
         7362          7469          7626          7450          7457 

> with(gss, table(filteredclass))
filteredclass
  Lower Class Working Class  Middle Class   Upper Class 
         7362          7469          7626          7450 

A much quicker way is to use droplevels() instead of the chain of ifelse() statements

# Filter/remove obs where class is No Class or NA
with(gss %>% mutate(filteredclass = droplevels(class, exclude = c(NA, "No Class"))),
     table(filteredclass))


filteredclass
  Lower Class Working Class  Middle Class   Upper Class 
         7362          7469          7626          7450 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM