[英]Count combination of variables based on unique column value
for a df对于 df
id=c(12,12,13,14,14,15,16,17,18,18)
reg = c('FR','FR','DE','US','US','TZ','MK','GR','ES','ES')
code1=c('F56','G76','G56','T78','G78','G76','G64','T65','G79','G56')
code2=c('G56','I89','J83','S46','D78','G56','H89','G56','W34','T89')
bin1= c(0,1,1,0,1,1,0,0,0,1)
bin2= c(1,0,1,0,0,1,1,1,0,0)
bin3= c(0,0,0,1,1,0,0,1,0,1)
df = data.frame(idnumber,reg,code1,code2, bin1, bin2, bin3)
looks like好像
id reg code1 code2 bin1 bin2 bin3
12 FR F56 G56 0 1 0
12 FR G76 I89 1 0 0
13 DE G56 J83 1 1 0
14 US T78 S46 0 0 1
14 US G78 D78 1 0 1
15 TZ G76 G56 1 1 0
16 MK G64 H89 0 1 0
17 GR T65 G56 0 1 1
18 ES G79 W34 0 0 0
18 ES G56 T89 1 0 1
I'm trying to count the number if occurrences of a combinations of binary variables ( bin1
, bin2
, bin3
) values, aggregated by unique idnumber
, something like:我想算号码,如果二元变量(一个组合的出现
bin1
, bin2
, bin3
)值,通过独特的聚合idnumber
,是这样的:
bin1 bin2 bin3 count
1 1 0 3
1 0 1 2
0 1 0 1
0 1 0 1
any suggestion welcomed!欢迎任何建议! Cheers
干杯
If I understood you correctly, you aggregate using something like an OR operator and then count the unique values.如果我理解正确,您可以使用 OR 运算符之类的东西进行聚合,然后计算唯一值。 Since it is all 0 and 1s to start with, you can get the max of each column when separated by id.
由于一开始都是0和1,你可以得到每列的最大值,当用id分隔时。 Try below in dplyr:
在 dplyr 中尝试以下操作:
library(dplyr)
df %>%
select(id,bin1,bin2,bin3) %>%
group_by(id) %>%
summarise_all(max) %>%
count(bin1,bin2,bin3)
# A tibble: 4 x 4
bin1 bin2 bin3 n
<dbl> <dbl> <dbl> <int>
1 0 1 0 1
2 0 1 1 1
3 1 0 1 2
4 1 1 0 3
Without installing dplyr, you can do this:无需安装 dplyr,您可以这样做:
by_id = aggregate(df[,c("bin1","bin2","bin3")],list(id=df$id),max)
aggregate(id~bin1+bin2+bin3,by_id,length)
bin1 bin2 bin3 id
1 0 1 0 1
2 1 1 0 3
3 1 0 1 2
4 0 1 1 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.