简体   繁体   中英

Counting the distinct values for each day and group and inserting the value in an array in R

I want to transform the data below to give me an association array with the count of each unique id in each group for each day. So, for example, from the data below

Year Month  Day Group ID
2014    04  26   1    A
2014    04  26   1    B
2014    04  26   2    B
2014    04  26   2    C
2014    05  12   1    B
2014    05  12   2    E
2014    05  12   2    F
2014    05  12   2    G
2014    05  12   3    G
2014    05  12   3    F
2015    05  19   1    F
2015    05  19   1    D
2015    05  19   2    E
2015    05  19   2    G
2015    05  19   2    D
2015    05  19   3    A
2015    05  19   3    E
2015    05  19   3    B

I want to make an array that gives:

[1] (04/26/2014)
Grp 1   2   3
1   0   1   0
2   1   0   0
3   0   0   0

[2] (05/12/2014)
Grp 1   2   3
1   0   0   1
2   0   0   2
3   1   2   0

[3] (05/19/2015)
Grp 1   2   3
1   0   1   0
2   1   0   1
3   0   1   0

The 'Grp' is just to indicate the group number. I know how to count the distinct values within the table, overall, but I'm trying to use for loops to also insert the appropriate unique value for each day for eg, inserting the unique number of IDs that are present in both group 1 and 2 in 04/26/2014 and inserting that number in the group 1 and group 2 association matrix for that day. Any help would be appreciated.

I don't quite understand how you get the second one, but you can try this

dd <- read.table(header = TRUE, text = "Year Month  Day Group ID
2014    04  26   1    A
2014    04  26   1    B
2014    04  26   2    B
2014    04  26   2    C
2014    05  12   1    B
2014    05  12   2    E
2014    05  12   2    F
2014    05  12   2    G
2014    05  12   3    G
2014    05  12   3    F
2015    05  19   1    F
2015    05  19   1    D
2015    05  19   2    E
2015    05  19   2    G
2015    05  19   2    D
2015    05  19   3    A
2015    05  19   3    E
2015    05  19   3    B")

dd <- within(dd, {
  date <- as.Date(apply(dd[, 1:3], 1, paste0, collapse = '-'))
  Group <- factor(Group)
  Year <- Month <- Day <- NULL
})

Eg, for the first one

sp <- split(dd, dd$date)[[1]]
tbl <- table(sp$ID, sp$Group)
`diag<-`(crossprod(tbl), 0)

#   1 2 3
# 1 0 1 0
# 2 1 0 0
# 3 0 0 0

And do them all at once

lapply(split(dd, dd$date), function(x) {
  cp <- crossprod(table(x$ID, x$Group))
  diag(cp) <- 0
  cp
})

# $`2014-04-26`
# 
#     1 2 3
#   1 0 1 0
#   2 1 0 0
#   3 0 0 0
# 
# $`2014-05-12`
# 
#     1 2 3
#   1 0 0 0
#   2 0 0 2
#   3 0 2 0
# 
# $`2015-05-19`
# 
#     1 2 3
#   1 0 1 0
#   2 1 0 1
#   3 0 1 0

A possible solution with dplyr and tidyr will be as follows:

library(dplyr)
library(tidyr)
df$date <- as.Date(paste(df$Year, df$Month, df$Day, sep = '-'))
df %>%
  expand(date, Group) %>%
  left_join(., df) %>%
  group_by(date, Group) %>%
  summarise(nID = n_distinct(ID)) %>%
  split(., .$date)

Resulting output:

$`2014-04-26`
Source: local data frame [3 x 3]
Groups: date [1]

        date Group   nID
      (date) (int) (int)
1 2014-04-26     1     2
2 2014-04-26     2     2
3 2014-04-26     3     1

$`2014-05-12`
Source: local data frame [3 x 3]
Groups: date [1]

        date Group   nID
      (date) (int) (int)
1 2014-05-12     1     1
2 2014-05-12     2     3
3 2014-05-12     3     2

$`2015-05-19`
Source: local data frame [3 x 3]
Groups: date [1]

        date Group   nID
      (date) (int) (int)
1 2015-05-19     1     2
2 2015-05-19     2     3
3 2015-05-19     3     3

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM