R：按变量分组然后按另一个变量的出现计数/过滤

Question

我有一个分类变量和状态的数据框。 对于每个州，我想找到最常见的分类变量，并过滤掉其余的。

例如

1  Alabama   cat_variable_1
2  Alabama   cat_variable_2
3  Alabama   cat_variable_2
4  Alabama   cat_variable_3

对于阿拉巴马州，cat_variable_2 将是最常见的 - 因此带有 cat_variable_2 的行将是阿拉巴马州此数据框中剩余的全部内容。 这将针对每个州进行。

1  Alabama   cat_variable_2
2  Alabama   cat_variable_2

非常感谢你！

Answer 1

您可以过滤每个State中出现最大次数的变量。

library(dplyr)
df %>% group_by(state) %>% filter(variable == names(which.max(table(variable))))

#   state   variable      
#  <chr>   <chr>         
#1 Alabama cat_variable_2
#2 Alabama cat_variable_2

您也可以在基数 R 中编写此内容：

subset(df, as.logical(ave(variable, state, 
           FUN = function(x) x == names(which.max(table(x))))))

和数据data.table ：

library(data.table)
setDT(df)[, .SD[variable == names(which.max(table(variable)))], state]

数据

df <- structure(list(state = c("Alabama", "Alabama", "Alabama", "Alabama"
), variable = c("cat_variable_1", "cat_variable_2", "cat_variable_2", 
"cat_variable_3")), row.names = c(NA, -4L), class = "data.frame")

Answer 2

一种方法是使用您想要的组合创建一个新的 df，然后在原始 df 上使用dplyr::inner_join以仅保留这些组合。

library(dplyr)

## An example df with two "states" with different most common cat_var.
df <- tibble(
  state = gl(2, 50, labels = c("AL", "NY")),
  cat_var = case_when(
    state == "AL" ~ sample(1:3, 100, TRUE, prob = c(.2, .3, .5)),
    state == "NY" ~ sample(1:3, 100, TRUE, prob = c(.5, .3, .2))
  ),
  y = rnorm(100)
)

## Keeps the cat_var in each state that is most common, giving a df
## with each state--cat_var comb that we can filter against.
state_vars <-
  df %>%
  count(state, cat_var, sort = TRUE) %>%
  group_by(state) %>%
  slice(1) %>%
  ungroup()

## Use `inner_join` to only keep those comb in `state_vars`.
inner_join(df, state_vars, by = c("state", "cat_var"))

R：按变量分组然后按另一个变量的出现计数/过滤

问题描述

2 个解决方案

解决方案1
0 2020-09-15 08:05:55

解决方案2
0 2020-09-15 08:16:11

R：按变量分组然后按另一个变量的出现计数/过滤

问题描述

2 个解决方案

解决方案1 0 2020-09-15 08:05:55

解决方案2 0 2020-09-15 08:16:11

解决方案1
0 2020-09-15 08:05:55

解决方案2
0 2020-09-15 08:16:11