R, drop unspecified factor levels

Question

I have a very sloppy, very large dataset that I am trying to clean up. One of the columns is labelled "Article Type" and it should only have 6 values: "Discussion", "Other", "Cohort Analysis", "Case Series", "Case Study", and "RCT".

Its in the raw data numerically, and I use this code to specify which is which:

 data$`Article Type`<-as.factor(data$`Article Type`)
data<-data%>%mutate(`Article Type`=fct_recode(`Article Type`,"RCT"="1","Cohort Analysis"="2","Case Series"="3","Case Study"="4","Discussion"="5","Other"="6"))

The problem is: there's a LOT of messed up data entry in this data set, and when I run this code:

data%>%count(`Article Type`)

Instead of counts of the 6 values I specified, I get this:

I know I can filter by doing something like:

data%>%filter(`Article Type`!="7")

or something, but I'd rather not write that out 30 times for every different value.

Is there a way to code something to the effect of: "If it wasn't one of these 6 levels, drop it"?

Answer 1

you could use the %in% operatos to keep only the value you need instead of excluding everything you do not need:

library(dplyr)

data%>%
  filter(`Article Type` %in% c("Discussion","Other","Cohort Analysis","Case Series","Case Study","RCT"))

R, drop unspecified factor levels

Question

1 answers

solution1
1 ACCPTED 2021-02-23 16:38:54

R, drop unspecified factor levels

Question

1 answers

solution1 1 ACCPTED 2021-02-23 16:38:54

solution1
1 ACCPTED 2021-02-23 16:38:54