R，降低未指定的因子水平

Question

I have a very sloppy, very large dataset that I am trying to clean up.我有一个非常草率、非常大的数据集，我正在尝试清理它。 One of the columns is labelled "Article Type" and it should only have 6 values: "Discussion", "Other", "Cohort Analysis", "Case Series", "Case Study", and "RCT".其中一列标记为“文章类型”，它应该只有 6 个值：“讨论”、“其他”、“队列分析”、“案例系列”、“案例研究”和“RCT”。

Its in the raw data numerically, and I use this code to specify which is which:它以数字形式存在于原始数据中，我使用此代码来指定哪个是哪个：

 data$`Article Type`<-as.factor(data$`Article Type`)
data<-data%>%mutate(`Article Type`=fct_recode(`Article Type`,"RCT"="1","Cohort Analysis"="2","Case Series"="3","Case Study"="4","Discussion"="5","Other"="6"))

The problem is: there's a LOT of messed up data entry in this data set, and when I run this code:问题是：这个数据集中有很多混乱的数据输入，当我运行这段代码时：

data%>%count(`Article Type`)

Instead of counts of the 6 values I specified, I get this:我得到的不是我指定的 6 个值的计数，而是：

I know I can filter by doing something like:我知道我可以通过执行以下操作进行过滤：

data%>%filter(`Article Type`!="7")

or something, but I'd rather not write that out 30 times for every different value.或其他东西，但我宁愿不要为每个不同的值写出 30 次。

Is there a way to code something to the effect of: "If it wasn't one of these 6 levels, drop it"?有没有办法编写一些代码：“如果它不是这 6 个级别之一，请放弃它”？

Answer 1

you could use the %in% operatos to keep only the value you need instead of excluding everything you do not need:您可以使用 %in% 操作符仅保留您需要的值，而不是排除您不需要的所有内容：

library(dplyr)

data%>%
  filter(`Article Type` %in% c("Discussion","Other","Cohort Analysis","Case Series","Case Study","RCT"))

R，降低未指定的因子水平

问题描述

1 个解决方案

解决方案1
1 已采纳 2021-02-23 16:38:54

R，降低未指定的因子水平

问题描述

1 个解决方案

解决方案1 1 已采纳 2021-02-23 16:38:54

解决方案1
1 已采纳 2021-02-23 16:38:54