简体   繁体   中英

R, drop unspecified factor levels

I have a very sloppy, very large dataset that I am trying to clean up. One of the columns is labelled "Article Type" and it should only have 6 values: "Discussion", "Other", "Cohort Analysis", "Case Series", "Case Study", and "RCT".

Its in the raw data numerically, and I use this code to specify which is which:

 data$`Article Type`<-as.factor(data$`Article Type`)
data<-data%>%mutate(`Article Type`=fct_recode(`Article Type`,"RCT"="1","Cohort Analysis"="2","Case Series"="3","Case Study"="4","Discussion"="5","Other"="6"))

The problem is: there's a LOT of messed up data entry in this data set, and when I run this code:

data%>%count(`Article Type`)

Instead of counts of the 6 values I specified, I get this:

在此处输入图像描述

I know I can filter by doing something like:

data%>%filter(`Article Type`!="7")

or something, but I'd rather not write that out 30 times for every different value.

Is there a way to code something to the effect of: "If it wasn't one of these 6 levels, drop it"?

you could use the %in% operatos to keep only the value you need instead of excluding everything you do not need:

library(dplyr)

data%>%
  filter(`Article Type` %in% c("Discussion","Other","Cohort Analysis","Case Series","Case Study","RCT")) 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM