[英]summarizing counts of a factor with dplyr
I want to group a data frame by a column (owner) and output a new data frame that has counts of each type of a factor at each observation.我想按列(所有者)对数据框进行分组,并输出一个新的数据框,该数据框在每次观察时对每种类型的因子进行计数。 The real data frame is fairly large, and there are 10 different factors.
真实的数据框相当大,有10个不同的因素。
Here is some example input:这是一些示例输入:
library(dplyr)
df = tbl_df(data.frame(owner=c(0,0,1,1), obs1=c("quiet", "loud", "quiet", "loud"), obs2=c("loud", "loud", "quiet", "quiet")))
owner obs1 obs2
1 0 quiet loud
2 0 loud loud
3 1 quiet quiet
4 1 loud quiet
I was looking for output that looks like this:我正在寻找如下所示的输出:
out = data.frame(owner=c("0", "0", "1", "1"), observation=c("obs1", "obs2", "obs1", "obs2"), quiet=c(1, 0, 1, 2), loud=c(1, 2, 1, 0))
owner observation quiet loud
1 0 obs1 1 1
2 0 obs2 0 2
3 1 obs1 1 1
4 1 obs2 2 0
Melting gets me partway there:融化让我走到了那里:
melted = tbl_df(melt(df, id=c("owner")))
owner variable value
1 0 obs1 quiet
2 0 obs1 loud
3 1 obs1 quiet
4 1 obs1 loud
5 0 obs2 loud
6 0 obs2 loud
7 1 obs2 quiet
8 1 obs2 quiet
But what's the last step?但最后一步是什么? If 'value' was a numeric, I'd just go:
如果“值”是一个数字,我会去:
melted %>% group_by(owner, variable) %>% summarise(counts=sum(value))
Thanks so much!非常感谢!
You could use tidyr
with dplyr
您可以将
tidyr
与dplyr
tidyr
使用
library(dplyr)
library(tidyr)
df %>%
gather(observation, Val, obs1:obs2) %>%
group_by(owner,observation, Val) %>%
summarise(n= n()) %>%
ungroup() %>%
spread(Val, n, fill=0)
which gives the output这给出了输出
# owner observation loud quiet
#1 0 obs1 1 1
#2 0 obs2 2 0
#3 1 obs1 1 1
#4 1 obs2 0 2
In 2017 the answer is 2017年的答案是
library(dplyr)
library(tidyr)
gather(df, key, value, -owner) %>%
group_by(owner, key, value) %>%
tally %>%
spread(value, n, fill = 0)
Which gives output这给出了输出
Source: local data frame [4 x 4]
Groups: owner, key [4]
owner key loud quiet
* <dbl> <chr> <dbl> <dbl>
1 0 obs1 1 1
2 0 obs2 2 0
3 1 obs1 1 1
4 1 obs2 0 2
In 2019 the answer is: 2019年的答案是:
gather(df, key, value, -owner) %>%
count(owner, key, value) %>%
spread(value, n, fill = 0)
If you wanted to forego the dplyr
, you can split into lists.如果你想放弃
dplyr
,你可以分成多个列表。
df <- split(df, list(df[[obs1]], df[[obs2]])
If you wanted the count
, you just create an sapply
or lapply
call to run through the lists and get the count of each one.如果您想要
count
,您只需创建一个sapply
或lapply
调用来遍历列表并获取每个列表的计数。 Or literally any other function you want.或者您想要的任何其他功能。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.