简体   繁体   中英

How to create the matrix from data frame?

i have a data frame as follows, I want to create a matrix, where the average sleep duration hour (SLP) is shown according to 3 recruiting site(site) and 5 recruiting year(year).

SLP site year 
8.6  1   2008  
7.2  1   2005  
6.4  2   2006  
9.5  3   2007  
6.1  2   2009  
3.6  1   2005 
8.6  1   2008  
7.2  1   2005  
6.4  2   2006  
9.5  3   2007  
6.1  2   2009  
5.1  3   2008 
2.1  2   2006  

My desired output is:

       1      2      3 
2005  6.00    -      -
2006   -     4.97    -
2007   -      -     9.5
2008  8.60    -     5.1
2009   -     6.10    -

Column name is variable of site, row name is variable of year and values in each cell is average of SLP. How do I do this?

Here are a few different solutions which use no packages:

1) tapply This uses no packages. It produces a "matrix" output with NA values for the empty cells:

tapply(DF$SLP, DF[c("year", "site")], mean)

giving:

      site
year     1        2   3
  2005 6.0       NA  NA
  2006  NA 4.966667  NA
  2007  NA       NA 9.5
  2008 8.6       NA 5.1
  2009  NA 6.100000  NA

2) aggregate/xtabs This uses aggregate + xtabs . This creates an object of class c("xtabs", "table") with zero values for empty cells:

fo <- SLP ~ year + site
xtabs(fo, aggregate(fo, DF, mean))

giving;

      site
year          1        2        3
  2005 6.000000 0.000000 0.000000
  2006 0.000000 4.966667 0.000000
  2007 0.000000 0.000000 9.500000
  2008 8.600000 0.000000 5.100000
  2009 0.000000 6.100000 0.000000

3) aggregate/reshape This also uses aggregate but uses reshape rather than xtabs . It gives a data frame r with NA's for empty cells. The last line makes the column names consistent with the prior solutions and could be omitted if this were not important.

 ag <- aggregate(SLP ~ site + year, DF, mean)
 r <- reshape(ag, dir = "wide", idvar = "year", timevar = "site")
 names(r) <- sub(".*[.]", "", names(r))

giving:

> r
  year   1        2   3
1 2005 6.0       NA  NA
3 2006  NA 4.966667  NA
5 2007  NA       NA 9.5
2 2008 8.6       NA 5.1
4 2009  NA 6.100000  NA

Note: The input DF in reproducible form used is:

DF <- structure(list(SLP = c(8.6, 7.2, 6.4, 9.5, 6.1, 3.6, 8.6, 7.2, 
6.4, 9.5, 6.1, 5.1, 2.1), site = c(1L, 1L, 2L, 3L, 2L, 1L, 1L, 
1L, 2L, 3L, 2L, 3L, 2L), year = c(2008L, 2005L, 2006L, 2007L, 
2009L, 2005L, 2008L, 2005L, 2006L, 2007L, 2009L, 2008L, 2006L
)), .Names = c("SLP", "site", "year"), class = "data.frame", row.names = c(NA, 
-13L))

We can use acast

library(reshape2)
acast(df1, year~site, value.var="SLP", mean)

Or using tapply from base R

with(df1, tapply(SLP, list(year, site), FUN = mean))

another solution

library(tidyr)
library(dplyr)

df%>% 
  group_by(year, site) %>%
    summarise(m=mean(SLP)) %>%
  spread(site, m )%>%
as.matrix()

Building on @g-grothendieck's use of xtabs, we can combine this with table and ifelse to return the same result.

# get a count of the number of observations per matrix cell (filling 0s with 1)
tempTab <- ifelse(with(df, table(year, + site)) == 0, 1, with(df, table(year, + site)))

tempTab

year   1 2 3
  2005 3 1 1
  2006 1 3 1
  2007 1 1 2
  2008 2 1 1
  2009 1 2 1

Now use xtabs , which returns a sum of the values when multiple observations are in a cell and divide by tempTab to get the mean.

xtabs(SLP ~ year + site, df) / tempTab
      site
year          1        2        3
  2005 6.000000 0.000000 0.000000
  2006 0.000000 4.966667 0.000000
  2007 0.000000 0.000000 9.500000
  2008 8.600000 0.000000 5.100000
  2009 0.000000 6.100000 0.000000

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM