简体   繁体   中英

Creating a three-way table of summary statistics in R

Example Data

I have 100 rows of patient data stored in the object example . For each patient, we know which one of five possible hospitals at which they were treated, the time period in which they were treated, and how many lymph nodes they had.

set.seed(50)

example <- data.frame(
Hospital = sample(as.factor(c("Hospital 1", "Hospital 2", "Hospital 3", "Hospital 4", "Hospital 5")), size = 100, replace = TRUE),
Time = sample(as.factor(c("2000-2002", "2003-2005", "2006-2008")), size = 100, replace = TRUE),
Nodes = sample(20:100, size = 100, replace = TRUE))

I know that I can view the summary statistics for the number of lymph nodes like so... (Note that I have appended the "n" to the rightward-most column, not sure if there is a more eloquent way to do this.)

cbind(do.call(rbind, by(example$Nodes, example$Hospital, summary)), table(example$Hospital, useNA = "no"))

             Min. 1st Qu. Median  Mean 3rd Qu. Max.   
  Hospital 1   20   34.25   54.0 55.55   77.75   90 22
  Hospital 2   22   38.75   60.5 56.25   71.75   94 20
  Hospital 3   22   37.00   51.0 57.12   81.00   96 17
  Hospital 4   25   39.75   55.5 57.11   72.25   97 28
  Hospital 5   26   42.00   50.0 57.00   77.00   99 13

Similarly, I can view them for the time period like so:

cbind(do.call(rbind, by(example$Nodes, example$Time, summary)), table(example$Time, useNA = "no"))
            Min. 1st Qu. Median  Mean 3rd Qu. Max.   
  2000-2002   20   40.00   57.0 58.84      77   97 37
  2003-2005   20   33.75   45.5 52.94      78   99 36
  2006-2008   23   39.50   61.0 58.33      72   98 27

Question

I would like to create a 3-way table table in which the leftward, outermost row identifiers are the five hospitals, further sub-stratified by time period. I want the columns to be the summary statistics for the number of lymph nodes. I have a feeling the xtabs() or ftable() might help, but have no idea how to apply them to my problem. In fact, typing ftable(example) gives me a table that is structured how I would want it to be, but the columns are not what I want. Thanks!

Edit #1 - In response to Ananda's comment below

Wow, yes that is almost exactly what I am looking for. My preference, however, would be for it to be in this format (with the numbers filled in, of course):

                     Nodes
                     Min.  1st Qu.  Median  Mean 3rd Qu.  Max.  n
Hospital   Time 
Hospital 1 2000-2002 
           2003-2005
           2006-2008
Hospital 2 2000-2002  
           2003-2005
           2006-2008

....and so forth....

Ordering the dataframe that results from the aggregate() function that @AnandaMahto mentioned above would provide something very close to what you need, but without the nested values:

    dF <- aggregate(Nodes~Hospital+Time, example, summary)
    dF <- dF[order(dF[, 1]), ]

         Hospital      Time Nodes.Min. Nodes.1st Qu. Nodes.Median Nodes.Mean Nodes.3rd Qu.
    1  Hospital 1 2000-2002      20.00         25.00        34.00      33.29         38.00
    6  Hospital 1 2003-2005      20.00         41.50        77.00      62.86         85.50
    11 Hospital 1 2006-2008      35.00         60.50        70.50      68.62         80.75
    2  Hospital 2 2000-2002      24.00         40.75        65.50      60.70         80.75
    7  Hospital 2 2003-2005      22.00         22.00        26.00      33.75         37.75
    12 Hospital 2 2006-2008      45.00         60.25        61.50      63.83         68.00
    3  Hospital 3 2000-2002      40.00         63.00        74.00      72.80         91.00
    8  Hospital 3 2003-2005      22.00         36.75        66.00      60.50         81.75
    13 Hospital 3 2006-2008      23.00         29.50        37.00      40.67         46.75
    4  Hospital 4 2000-2002      30.00         55.75        64.50      68.17         90.00
    9  Hospital 4 2003-2005      25.00         38.25        42.00      49.36         59.50
    14 Hospital 4 2006-2008      27.00         36.00        45.00      45.00         54.00
    5  Hospital 5 2000-2002      26.00         39.00        52.00      51.67         64.50
    10 Hospital 5 2003-2005      34.00         42.00        50.00      55.40         52.00
    15 Hospital 5 2006-2008      30.00         42.00        48.00      61.80         91.00
    Nodes.Max.
    1       53.00
    6       89.00
    11      90.00
    2       94.00
    7       61.00
    12      85.00
    3       96.00
    8       95.00
    13      70.00
    4       97.00
    9       89.00
    14      63.00
    5       77.00
    10      99.00
    15      98.00      

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM