I want to transform the data below to give me an association array with the count of each unique id in each group for each day. So, for example, from the data below
Year Month Day Group ID
2014 04 26 1 A
2014 04 26 1 B
2014 04 26 2 B
2014 04 26 2 C
2014 05 12 1 B
2014 05 12 2 E
2014 05 12 2 F
2014 05 12 2 G
2014 05 12 3 G
2014 05 12 3 F
2015 05 19 1 F
2015 05 19 1 D
2015 05 19 2 E
2015 05 19 2 G
2015 05 19 2 D
2015 05 19 3 A
2015 05 19 3 E
2015 05 19 3 B
I want to make an array that gives:
[1] (04/26/2014)
Grp 1 2 3
1 0 1 0
2 1 0 0
3 0 0 0
[2] (05/12/2014)
Grp 1 2 3
1 0 0 1
2 0 0 2
3 1 2 0
[3] (05/19/2015)
Grp 1 2 3
1 0 1 0
2 1 0 1
3 0 1 0
The 'Grp' is just to indicate the group number. I know how to count the distinct values within the table, overall, but I'm trying to use for loops to also insert the appropriate unique value for each day for eg, inserting the unique number of IDs that are present in both group 1 and 2 in 04/26/2014 and inserting that number in the group 1 and group 2 association matrix for that day. Any help would be appreciated.
I don't quite understand how you get the second one, but you can try this
dd <- read.table(header = TRUE, text = "Year Month Day Group ID
2014 04 26 1 A
2014 04 26 1 B
2014 04 26 2 B
2014 04 26 2 C
2014 05 12 1 B
2014 05 12 2 E
2014 05 12 2 F
2014 05 12 2 G
2014 05 12 3 G
2014 05 12 3 F
2015 05 19 1 F
2015 05 19 1 D
2015 05 19 2 E
2015 05 19 2 G
2015 05 19 2 D
2015 05 19 3 A
2015 05 19 3 E
2015 05 19 3 B")
dd <- within(dd, {
date <- as.Date(apply(dd[, 1:3], 1, paste0, collapse = '-'))
Group <- factor(Group)
Year <- Month <- Day <- NULL
})
Eg, for the first one
sp <- split(dd, dd$date)[[1]]
tbl <- table(sp$ID, sp$Group)
`diag<-`(crossprod(tbl), 0)
# 1 2 3
# 1 0 1 0
# 2 1 0 0
# 3 0 0 0
And do them all at once
lapply(split(dd, dd$date), function(x) {
cp <- crossprod(table(x$ID, x$Group))
diag(cp) <- 0
cp
})
# $`2014-04-26`
#
# 1 2 3
# 1 0 1 0
# 2 1 0 0
# 3 0 0 0
#
# $`2014-05-12`
#
# 1 2 3
# 1 0 0 0
# 2 0 0 2
# 3 0 2 0
#
# $`2015-05-19`
#
# 1 2 3
# 1 0 1 0
# 2 1 0 1
# 3 0 1 0
A possible solution with dplyr
and tidyr
will be as follows:
library(dplyr)
library(tidyr)
df$date <- as.Date(paste(df$Year, df$Month, df$Day, sep = '-'))
df %>%
expand(date, Group) %>%
left_join(., df) %>%
group_by(date, Group) %>%
summarise(nID = n_distinct(ID)) %>%
split(., .$date)
Resulting output:
$`2014-04-26`
Source: local data frame [3 x 3]
Groups: date [1]
date Group nID
(date) (int) (int)
1 2014-04-26 1 2
2 2014-04-26 2 2
3 2014-04-26 3 1
$`2014-05-12`
Source: local data frame [3 x 3]
Groups: date [1]
date Group nID
(date) (int) (int)
1 2014-05-12 1 1
2 2014-05-12 2 3
3 2014-05-12 3 2
$`2015-05-19`
Source: local data frame [3 x 3]
Groups: date [1]
date Group nID
(date) (int) (int)
1 2015-05-19 1 2
2 2015-05-19 2 3
3 2015-05-19 3 3
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.