简体   繁体   English

在R中创建摘要统计的三向表

[英]Creating a three-way table of summary statistics in R

Example Data 示例数据

I have 100 rows of patient data stored in the object example . 我在对象example存储了100行患者数据。 For each patient, we know which one of five possible hospitals at which they were treated, the time period in which they were treated, and how many lymph nodes they had. 对于每位患者,我们知道他们接受治疗的五家可能的医院中的哪一家,他们接受治疗的时间段以及他们有多少淋巴结。

set.seed(50)

example <- data.frame(
Hospital = sample(as.factor(c("Hospital 1", "Hospital 2", "Hospital 3", "Hospital 4", "Hospital 5")), size = 100, replace = TRUE),
Time = sample(as.factor(c("2000-2002", "2003-2005", "2006-2008")), size = 100, replace = TRUE),
Nodes = sample(20:100, size = 100, replace = TRUE))

I know that I can view the summary statistics for the number of lymph nodes like so... (Note that I have appended the "n" to the rightward-most column, not sure if there is a more eloquent way to do this.) 我知道我可以查看淋巴结数量的摘要统计数据......(请注意,我已将“n”附加到最右侧的列,不确定是否有更有说服力的方法来执行此操作。 )

cbind(do.call(rbind, by(example$Nodes, example$Hospital, summary)), table(example$Hospital, useNA = "no"))

             Min. 1st Qu. Median  Mean 3rd Qu. Max.   
  Hospital 1   20   34.25   54.0 55.55   77.75   90 22
  Hospital 2   22   38.75   60.5 56.25   71.75   94 20
  Hospital 3   22   37.00   51.0 57.12   81.00   96 17
  Hospital 4   25   39.75   55.5 57.11   72.25   97 28
  Hospital 5   26   42.00   50.0 57.00   77.00   99 13

Similarly, I can view them for the time period like so: 同样,我可以像这样查看它们的时间段:

cbind(do.call(rbind, by(example$Nodes, example$Time, summary)), table(example$Time, useNA = "no"))
            Min. 1st Qu. Median  Mean 3rd Qu. Max.   
  2000-2002   20   40.00   57.0 58.84      77   97 37
  2003-2005   20   33.75   45.5 52.94      78   99 36
  2006-2008   23   39.50   61.0 58.33      72   98 27

Question

I would like to create a 3-way table table in which the leftward, outermost row identifiers are the five hospitals, further sub-stratified by time period. 我想创建一个3向表表,其中向左,最外面的行标识符是五个医院,进一步按时间段分层。 I want the columns to be the summary statistics for the number of lymph nodes. 我希望列是淋巴结数量的汇总统计数据。 I have a feeling the xtabs() or ftable() might help, but have no idea how to apply them to my problem. 我有一种感觉xtabs()或ftable()可能有所帮助,但不知道如何将它们应用于我的问题。 In fact, typing ftable(example) gives me a table that is structured how I would want it to be, but the columns are not what I want. 事实上,输入ftable(example)给了我一个结构表,我想要它的结构,但列不是我想要的。 Thanks! 谢谢!

Edit #1 - In response to Ananda's comment below 编辑#1 - 回应Ananda的评论如下

Wow, yes that is almost exactly what I am looking for. 哇,是的,这几乎就是我要找的东西。 My preference, however, would be for it to be in this format (with the numbers filled in, of course): 但是,我的偏好是它采用这种格式(当然还有数字填写):

                     Nodes
                     Min.  1st Qu.  Median  Mean 3rd Qu.  Max.  n
Hospital   Time 
Hospital 1 2000-2002 
           2003-2005
           2006-2008
Hospital 2 2000-2002  
           2003-2005
           2006-2008

....and so forth....

Ordering the dataframe that results from the aggregate() function that @AnandaMahto mentioned above would provide something very close to what you need, but without the nested values: 排序由上面提到的@AnandaMahto的aggregate()函数产生的数据帧将提供非常接近你需要的东西,但没有嵌套值:

    dF <- aggregate(Nodes~Hospital+Time, example, summary)
    dF <- dF[order(dF[, 1]), ]

         Hospital      Time Nodes.Min. Nodes.1st Qu. Nodes.Median Nodes.Mean Nodes.3rd Qu.
    1  Hospital 1 2000-2002      20.00         25.00        34.00      33.29         38.00
    6  Hospital 1 2003-2005      20.00         41.50        77.00      62.86         85.50
    11 Hospital 1 2006-2008      35.00         60.50        70.50      68.62         80.75
    2  Hospital 2 2000-2002      24.00         40.75        65.50      60.70         80.75
    7  Hospital 2 2003-2005      22.00         22.00        26.00      33.75         37.75
    12 Hospital 2 2006-2008      45.00         60.25        61.50      63.83         68.00
    3  Hospital 3 2000-2002      40.00         63.00        74.00      72.80         91.00
    8  Hospital 3 2003-2005      22.00         36.75        66.00      60.50         81.75
    13 Hospital 3 2006-2008      23.00         29.50        37.00      40.67         46.75
    4  Hospital 4 2000-2002      30.00         55.75        64.50      68.17         90.00
    9  Hospital 4 2003-2005      25.00         38.25        42.00      49.36         59.50
    14 Hospital 4 2006-2008      27.00         36.00        45.00      45.00         54.00
    5  Hospital 5 2000-2002      26.00         39.00        52.00      51.67         64.50
    10 Hospital 5 2003-2005      34.00         42.00        50.00      55.40         52.00
    15 Hospital 5 2006-2008      30.00         42.00        48.00      61.80         91.00
    Nodes.Max.
    1       53.00
    6       89.00
    11      90.00
    2       94.00
    7       61.00
    12      85.00
    3       96.00
    8       95.00
    13      70.00
    4       97.00
    9       89.00
    14      63.00
    5       77.00
    10      99.00
    15      98.00      

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM