删除具有每个级别少于 5 个观测值的因子的列

Question

I have a dataset composed of more than 100 columns and all columns are of type factor.我有一个由 100 多列组成的数据集，所有列都是类型因子。 Ex:前任：

          animal               fruit               vehicle              color 
             cat              orange                   car               blue 
             dog               apple                   bus              green 
             dog               apple                   car              green 
             dog              orange                   bus              green

In my dataset i need to remove all columns with factors thas has less than 5 observations per level.在我的数据集中，我需要删除所有具有每个级别少于 5 个观察值的因子的列。 In this example, if i want to remove all columns with amount of observations per levels less than or equal to 1 , like blue or cat , the algorithm will remove the columns animal and color .在此示例中，如果我想删除每个级别的观察量小于或等于1的所有列，例如blue或cat ，算法将删除列animal和color 。 What is the most elegant way to do this?最优雅的方法是什么？

Answer 1

We can use Filter with table我们可以在table中使用Filter

Filter(function(x) !any(table(x) < 2), df1)
#  fruit vehicle
#1 orange     car
#2  apple     bus
#3  apple     car
#4 orange     bus

data数据

df1 <- structure(list(animal = structure(c(1L, 2L, 2L, 2L), .Label = c("cat", 
"dog"), class = "factor"), fruit = structure(c(2L, 1L, 1L, 2L
), .Label = c("apple", "orange"), class = "factor"), vehicle = structure(c(2L, 
1L, 2L, 1L), .Label = c("bus", "car"), class = "factor"), color = structure(c(1L, 
2L, 2L, 2L), .Label = c("blue", "green"), class = "factor")),
row.names = c(NA, 
-4L), class = "data.frame")

Answer 2

We can use select_if from dplyr我们可以使用select_if中的dplyr

library(dplyr)
df1 %>% select_if(~all(table(.) > 1))

#   fruit vehicle
#1 orange     car
#2  apple     bus
#3  apple     car
#4 orange     bus

删除具有每个级别少于 5 个观测值的因子的列

问题描述

2 个解决方案

解决方案1
1 已采纳 2020-05-14 21:01:07

data数据

解决方案2
0 2020-05-15 00:39:23

删除具有每个级别少于 5 个观测值的因子的列

问题描述

2 个解决方案

解决方案1 1 已采纳 2020-05-14 21:01:07

data数据

解决方案2 0 2020-05-15 00:39:23

解决方案1
1 已采纳 2020-05-14 21:01:07

解决方案2
0 2020-05-15 00:39:23