简体   繁体   English

如何根据 R 中的多列汇总成对计数?

[英]How to summarize pairwise counts based on multiple columns in R?

In my dataset, I have plant counts for several months time across few sites sites .在我的数据集中,我有几个月time跨几个站点sites的植物计数。 I identified plant species who and measured flowers vals .我确定了植物种类who测量了花朵vals Notice is some sites only few species flower (May Site 1 and 2) and all species in others (June) for a given month.请注意,某些站点只有少数物种开花(5 月站点 1 和 2),而其他站点的所有物种(6 月)在给定月份开花。 I am first trying to subset this data for only three species of inters "A","B" and "C"我首先尝试将这些数据子集化为仅三种中间体“A”、“B”和“C”

time <- c("May","May","May","May","May","May","May","May","Jun","Jun","Jun","Jun")
site <- c(1,1,1,1,2,2,2,2,1,1,1,1)
who <- c("A","B","C","D","A","B","C","D","A","B","C","D")
val <- c(12,0,1,2,4,6,0,8,10,2,10,2)

df.test <- data.frame(time, site, who, val)

#First Need to subset rows containing `who` A, B and C
df.test <- df.test[df.test$who == c("A","B","C"), ]
 Error: I am not sure why its only picking up only from site 1. I am looking for 9 rows not 3
time site who val
 1   May 1   A  12
 2   May 1   B   0
 3   May 1   C   1

Then, based on this correct subset of data I want to find counts of how many unique sites (unique in time, and site) have only positive, non-zero values of A and B;然后,基于这个正确的数据子集,我想计算有多少唯一站点(在时间和站点上是唯一的)只有 A 和 B 的正非零值; A and C; A和C; B and C; B和C; A,B,C? A,B,C?

A and B only = 1
A and C only = 1
B and C only = 0
A, B and C only = 1

I think this is pretty close to what you want:我认为这非常接近您想要的:

df.test2 <- df.test[df.test$who %in% c("A","B","C"), ]
tbl <- xtabs(~who+site+time, df.test2)
(tbl2 <- ftable(tbl, row.vars=1, col.vars=2:3))
#     site   1       2    
#     time Jun May Jun May
# who                     
# A          1   1   0   1
# B          1   1   0   1
# C          1   1   0   1

Something like this:像这样的东西:

library(dplyr)

df.test %>% 
  filter(who %in% c("A", "B", "C")) %>% 
  group_by(site, time) %>% 
  mutate(x = ifelse(val > 0, who, FALSE)) %>% 
  do(data.frame(t(combn(.$x, 3)))) %>% 
  count(site, time, X1, X2, X3) 
   site time  X1    X2    X3        n
  <dbl> <chr> <chr> <chr> <chr> <int>
1     1 Jun   A     B     C         1
2     1 May   A     FALSE C         1
3     2 May   A     B     FALSE     1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM