i have a data frame as follows, I want to create a matrix, where the average sleep duration hour (SLP) is shown according to 3 recruiting site(site) and 5 recruiting year(year).
SLP site year
8.6 1 2008
7.2 1 2005
6.4 2 2006
9.5 3 2007
6.1 2 2009
3.6 1 2005
8.6 1 2008
7.2 1 2005
6.4 2 2006
9.5 3 2007
6.1 2 2009
5.1 3 2008
2.1 2 2006
My desired output is:
1 2 3
2005 6.00 - -
2006 - 4.97 -
2007 - - 9.5
2008 8.60 - 5.1
2009 - 6.10 -
Column name is variable of site, row name is variable of year and values in each cell is average of SLP. How do I do this?
Here are a few different solutions which use no packages:
1) tapply This uses no packages. It produces a "matrix"
output with NA values for the empty cells:
tapply(DF$SLP, DF[c("year", "site")], mean)
giving:
site
year 1 2 3
2005 6.0 NA NA
2006 NA 4.966667 NA
2007 NA NA 9.5
2008 8.6 NA 5.1
2009 NA 6.100000 NA
2) aggregate/xtabs This uses aggregate
+ xtabs
. This creates an object of class c("xtabs", "table")
with zero values for empty cells:
fo <- SLP ~ year + site
xtabs(fo, aggregate(fo, DF, mean))
giving;
site
year 1 2 3
2005 6.000000 0.000000 0.000000
2006 0.000000 4.966667 0.000000
2007 0.000000 0.000000 9.500000
2008 8.600000 0.000000 5.100000
2009 0.000000 6.100000 0.000000
3) aggregate/reshape This also uses aggregate
but uses reshape
rather than xtabs
. It gives a data frame r
with NA's for empty cells. The last line makes the column names consistent with the prior solutions and could be omitted if this were not important.
ag <- aggregate(SLP ~ site + year, DF, mean)
r <- reshape(ag, dir = "wide", idvar = "year", timevar = "site")
names(r) <- sub(".*[.]", "", names(r))
giving:
> r
year 1 2 3
1 2005 6.0 NA NA
3 2006 NA 4.966667 NA
5 2007 NA NA 9.5
2 2008 8.6 NA 5.1
4 2009 NA 6.100000 NA
Note: The input DF
in reproducible form used is:
DF <- structure(list(SLP = c(8.6, 7.2, 6.4, 9.5, 6.1, 3.6, 8.6, 7.2,
6.4, 9.5, 6.1, 5.1, 2.1), site = c(1L, 1L, 2L, 3L, 2L, 1L, 1L,
1L, 2L, 3L, 2L, 3L, 2L), year = c(2008L, 2005L, 2006L, 2007L,
2009L, 2005L, 2008L, 2005L, 2006L, 2007L, 2009L, 2008L, 2006L
)), .Names = c("SLP", "site", "year"), class = "data.frame", row.names = c(NA,
-13L))
We can use acast
library(reshape2)
acast(df1, year~site, value.var="SLP", mean)
Or using tapply
from base R
with(df1, tapply(SLP, list(year, site), FUN = mean))
another solution
library(tidyr)
library(dplyr)
df%>%
group_by(year, site) %>%
summarise(m=mean(SLP)) %>%
spread(site, m )%>%
as.matrix()
Building on @g-grothendieck's use of xtabs, we can combine this with table
and ifelse
to return the same result.
# get a count of the number of observations per matrix cell (filling 0s with 1)
tempTab <- ifelse(with(df, table(year, + site)) == 0, 1, with(df, table(year, + site)))
tempTab
year 1 2 3
2005 3 1 1
2006 1 3 1
2007 1 1 2
2008 2 1 1
2009 1 2 1
Now use xtabs
, which returns a sum of the values when multiple observations are in a cell and divide by tempTab to get the mean.
xtabs(SLP ~ year + site, df) / tempTab
site
year 1 2 3
2005 6.000000 0.000000 0.000000
2006 0.000000 4.966667 0.000000
2007 0.000000 0.000000 9.500000
2008 8.600000 0.000000 5.100000
2009 0.000000 6.100000 0.000000
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.