简体   繁体   中英

R - ddply summarise using nlevels() does not work

When using the plyr package to summarise my data, it seems impossible to use the nlevels() function.

The structure of my data set is as follows:

>aer <- read.xlsx("XXXX.xlsx", sheetIndex=1)
>aer$ID <- as.factor(aer$ID)
>aer$description <- as.factor(aer$description)    
>head(aer)

  ID SOC   start        end         days  count severity relation
1  1 410   2015-04-21   2015-04-28    7     1        1        3
2  1 500   2015-01-30   2015-05-04   94     1        1        3
3  1 600   2014-11-25   2014-11-29    4     1        1        3
4  1 600   2015-01-02   2015-01-07    5     1        1        3
5  1 600   2015-01-26   2015-03-02   35     1        1        3
6  1 600   2015-04-14   2015-04-17    3     1        1        3

> dput(head(aer,4))
structure(list(ID = structure(c(1L, 1L, 1L, 1L), .Label = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10", "12", "13", "14", 
"15"), class = "factor"), SOC = c(410, 500, 600, 600),  
start = structure(c(16546, 16465, 16399, 16437), class = "Date"), 
end = structure(c(16553, 16559, 16403, 16442), class = "Date"), 
days = c(7, 94, 4, 5), count = c(1, 1, 1, 1), severity = c(1, 
1, 1, 1), relation = c(3, 3, 3, 3)), .Names = c("ID", "SOC", 
"description", "start", "end", "days", "count", "severity", "relation"
), row.names = c(NA, 4L), class = "data.frame")

What I would like to know is how many levels exists in the "ID" variable in data sections created, when dividing the data set using the variable "SOC". I want to summarise this information together with some other variables in a new data set. Therefore, I would like to use the plyr package like so:

summaer2 <- ddply(aer, c("SOC"), summarise,
    participants    = nlevels(ID), 
    events          = sum(count),
    min_duration    = min(days), 
    max_duration    = max(days),
    max_severity    = max(severity))

This returns the following error:

Error in Summary.factor(c(4L, 5L, 11L, 11L, 14L, 14L), na.rm = FALSE) : 
‘max’ not meaningful for factors

Could someone give me advice on how to reach my goal? Or what I'm doing wrong?

Many thanks in advance!

Update:

Substituting nlevels(ID) with length(unique(ID)) seems to give me the desired output:

> head(summaer2)
   SOC participants events min_duration max_duration max_severity
1  100            4      7            1           62            2
2  410            9     16            1           41            2
3  431            2      2          109          132            1
4  500            5      9           23          125            2
5  600            8     19            1           35            1
6 1040            1      1           98           98            2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM