简体   繁体   English

R,降低未指定的因子水平

[英]R, drop unspecified factor levels

I have a very sloppy, very large dataset that I am trying to clean up.我有一个非常草率、非常大的数据集,我正在尝试清理它。 One of the columns is labelled "Article Type" and it should only have 6 values: "Discussion", "Other", "Cohort Analysis", "Case Series", "Case Study", and "RCT".其中一列标记为“文章类型”,它应该只有 6 个值:“讨论”、“其他”、“队列分析”、“案例系列”、“案例研究”和“RCT”。

Its in the raw data numerically, and I use this code to specify which is which:它以数字形式存在于原始数据中,我使用此代码来指定哪个是哪个:

 data$`Article Type`<-as.factor(data$`Article Type`)
data<-data%>%mutate(`Article Type`=fct_recode(`Article Type`,"RCT"="1","Cohort Analysis"="2","Case Series"="3","Case Study"="4","Discussion"="5","Other"="6"))

The problem is: there's a LOT of messed up data entry in this data set, and when I run this code:问题是:这个数据集中有很多混乱的数据输入,当我运行这段代码时:

data%>%count(`Article Type`)

Instead of counts of the 6 values I specified, I get this:我得到的不是我指定的 6 个值的计数,而是:

在此处输入图像描述

I know I can filter by doing something like:我知道我可以通过执行以下操作进行过滤:

data%>%filter(`Article Type`!="7")

or something, but I'd rather not write that out 30 times for every different value.或其他东西,但我宁愿不要为每个不同的值写出 30 次。

Is there a way to code something to the effect of: "If it wasn't one of these 6 levels, drop it"?有没有办法编写一些代码:“如果它不是这 6 个级别之一,请放弃它”?

you could use the %in% operatos to keep only the value you need instead of excluding everything you do not need:您可以使用 %in% 操作符仅保留您需要的值,而不是排除您不需要的所有内容:

library(dplyr)

data%>%
  filter(`Article Type` %in% c("Discussion","Other","Cohort Analysis","Case Series","Case Study","RCT")) 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM