简体   繁体   English

根据唯一列值计算变量组合

[英]Count combination of variables based on unique column value

for a df对于 df

id=c(12,12,13,14,14,15,16,17,18,18)
reg = c('FR','FR','DE','US','US','TZ','MK','GR','ES','ES')
code1=c('F56','G76','G56','T78','G78','G76','G64','T65','G79','G56')
code2=c('G56','I89','J83','S46','D78','G56','H89','G56','W34','T89')
bin1= c(0,1,1,0,1,1,0,0,0,1)
bin2= c(1,0,1,0,0,1,1,1,0,0)
bin3= c(0,0,0,1,1,0,0,1,0,1)
df = data.frame(idnumber,reg,code1,code2, bin1, bin2, bin3)

looks like好像

id  reg code1 code2 bin1 bin2 bin3
12  FR  F56   G56    0    1    0
12  FR  G76   I89    1    0    0
13  DE  G56   J83    1    1    0
14  US  T78   S46    0    0    1
14  US  G78   D78    1    0    1
15  TZ  G76   G56    1    1    0
16  MK  G64   H89    0    1    0
17  GR  T65   G56    0    1    1
18  ES  G79   W34    0    0    0
18  ES  G56   T89    1    0    1

I'm trying to count the number if occurrences of a combinations of binary variables ( bin1 , bin2 , bin3 ) values, aggregated by unique idnumber , something like:我想算号码,如果二元变量(一个组合的出现bin1bin2bin3 )值,通过独特的聚合idnumber ,是这样的:

bin1 bin2 bin3 count
  1   1    0    3
  1   0    1    2
  0   1    0    1
  0   1    0    1

any suggestion welcomed!欢迎任何建议! Cheers干杯

If I understood you correctly, you aggregate using something like an OR operator and then count the unique values.如果我理解正确,您可以使用 OR 运算符之类的东西进行聚合,然后计算唯一值。 Since it is all 0 and 1s to start with, you can get the max of each column when separated by id.由于一开始都是0和1,你可以得到每列的最大值,当用id分隔时。 Try below in dplyr:在 dplyr 中尝试以下操作:

library(dplyr)
df %>% 
select(id,bin1,bin2,bin3) %>% 
group_by(id) %>% 
summarise_all(max) %>% 
count(bin1,bin2,bin3)

# A tibble: 4 x 4
   bin1  bin2  bin3     n
  <dbl> <dbl> <dbl> <int>
1     0     1     0     1
2     0     1     1     1
3     1     0     1     2
4     1     1     0     3

Without installing dplyr, you can do this:无需安装 dplyr,您可以这样做:

by_id = aggregate(df[,c("bin1","bin2","bin3")],list(id=df$id),max)
aggregate(id~bin1+bin2+bin3,by_id,length)
  bin1 bin2 bin3 id
1    0    1    0  1
2    1    1    0  3
3    1    0    1  2
4    0    1    1  1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 根据两个或多个变量的唯一和详尽组合查找最小值 - Finding minimum value based on unique and exhaustive combination of two or more variables 在 R 中提取唯一列组合并查找总和和计数 - Extracting unique column combination and finding sum and count in R 为每个唯一的变量组合创建具有 n 个随机数的列 - Create column with n random numbers for each unique combination of variables R dplyr,不同的,唯一的变量组合,最大值为第三 - R dplyr, distinct, unique combination of variables, with maximum value of third 根据 R 中第二列的条件为列中的每个唯一值创建虚拟变量 - Create dummy variables for every unique value in a column based on a condition from a second column in R 根据列中的唯一值计数点 - Count points based on unique values in column 计算唯一组合中的出现次数 - count occurrences in unique group combination 根据第2列中的最大值找到第1列中值的最高组合-R - Find the highest combination of values in column 1 based on max value in column 2 - R 如何根据其他列中的值计算列中唯一值的数量 - How to count number of unique values in column based on values in other column 根据 dataframe 中另一列的值汇总和计算一列的唯一值 - Summarize and count unique values of a column based on the values of another column in a dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM