用 dplyr 总结一个因子的计数

Question

I want to group a data frame by a column (owner) and output a new data frame that has counts of each type of a factor at each observation.我想按列（所有者）对数据框进行分组，并输出一个新的数据框，该数据框在每次观察时对每种类型的因子进行计数。 The real data frame is fairly large, and there are 10 different factors.真实的数据框相当大，有10个不同的因素。

Here is some example input:这是一些示例输入：

library(dplyr)
df = tbl_df(data.frame(owner=c(0,0,1,1), obs1=c("quiet", "loud", "quiet", "loud"), obs2=c("loud", "loud", "quiet", "quiet")))

  owner  obs1  obs2
1     0 quiet  loud
2     0  loud  loud
3     1 quiet quiet
4     1  loud quiet

I was looking for output that looks like this:我正在寻找如下所示的输出：

out = data.frame(owner=c("0", "0", "1", "1"), observation=c("obs1", "obs2", "obs1", "obs2"), quiet=c(1, 0, 1, 2), loud=c(1, 2, 1, 0))

  owner observation quiet loud
1     0        obs1     1    1
2     0        obs2     0    2
3     1        obs1     1    1
4     1        obs2     2    0

Melting gets me partway there:融化让我走到了那里：

melted = tbl_df(melt(df, id=c("owner")))

  owner variable value
1     0     obs1 quiet
2     0     obs1  loud
3     1     obs1 quiet
4     1     obs1  loud
5     0     obs2  loud
6     0     obs2  loud
7     1     obs2 quiet
8     1     obs2 quiet

But what's the last step?但最后一步是什么？ If 'value' was a numeric, I'd just go:如果“值”是一个数字，我会去：

melted %>% group_by(owner, variable) %>% summarise(counts=sum(value))

Thanks so much!非常感谢！

Answer 1

You could use tidyr with dplyr您可以将tidyr与dplyr tidyr使用

library(dplyr)
library(tidyr)

 df %>%
 gather(observation, Val, obs1:obs2) %>% 
 group_by(owner,observation, Val) %>% 
 summarise(n= n()) %>%
 ungroup() %>%
 spread(Val, n, fill=0)

which gives the output这给出了输出

  #    owner observation loud quiet
  #1     0        obs1    1     1
  #2     0        obs2    2     0
  #3     1        obs1    1     1
  #4     1        obs2    0     2

Answer 2

In 2017 the answer is 2017年的答案是

library(dplyr)
library(tidyr)

gather(df, key, value, -owner) %>%
  group_by(owner, key, value) %>%
  tally %>% 
  spread(value, n, fill = 0)

Which gives output这给出了输出

Source: local data frame [4 x 4]
Groups: owner, key [4]

  owner   key  loud quiet
* <dbl> <chr> <dbl> <dbl>
1     0  obs1     1     1
2     0  obs2     2     0
3     1  obs1     1     1
4     1  obs2     0     2

In 2019 the answer is: 2019年的答案是：

gather(df, key, value, -owner) %>% 
    count(owner, key, value) %>% 
    spread(value, n, fill = 0)

Answer 3

If you wanted to forego the dplyr , you can split into lists.如果你想放弃dplyr ，你可以分成多个列表。

df <- split(df, list(df[[obs1]], df[[obs2]])

If you wanted the count , you just create an sapply or lapply call to run through the lists and get the count of each one.如果您想要count ，您只需创建一个sapply或lapply调用来遍历列表并获取每个列表的计数。 Or literally any other function you want.或者您想要的任何其他功能。

用 dplyr 总结一个因子的计数

问题描述

3 个解决方案

解决方案1
29 2014-09-12 15:45:16

解决方案2
29 已采纳 2017-01-19 07:05:15

解决方案3
3 2015-12-12 01:08:21

用 dplyr 总结一个因子的计数

问题描述

3 个解决方案

解决方案1 29 2014-09-12 15:45:16

解决方案2 29 已采纳 2017-01-19 07:05:15

解决方案3 3 2015-12-12 01:08:21

解决方案1
29 2014-09-12 15:45:16

解决方案2
29 已采纳 2017-01-19 07:05:15

解决方案3
3 2015-12-12 01:08:21