简体   繁体   中英

R- Several min values using ddply

I have a df like this:

> head(datamelt)
                  TIMESTAMP ring dendro diameter   ID Rain_mm_Tot year DOY
1373635 2013-05-02 00:00:00    1      1     3405 r1_1           0 2013 122
1373672 2013-05-02 00:15:00    1      1     3417 r1_1           0 2013 122
1373735 2013-05-02 00:30:00    1      1     3417 r1_1           0 2013 122
1373777 2013-05-02 00:45:00    1      1     3426 r1_1           0 2013 122
1373826 2013-05-02 01:00:00    1      1     3438 r1_1           0 2013 122
1373873 2013-05-02 01:15:00    1      1     3444 r1_1           0 2013 122

I used ddply to get the min diameter value for each of the year (DOY) in each dendro (dendrometers) in two different ways: i) The first one do its job, giving one value for each day and dendro. nrow(dailymin)=5784. However, I don't know what it does when there are several min values in one day, but in those cases it is not the result I need:

library(plyr)
dailymin <- ddply(datamelt, .(year,DOY,ring,dendro), function(x)x[which.min(x$diameter), ])

ii) The second way returns several rows for each day if there are several min values, which is OK. nrow(dailymin)=12634:

dailymin <- ddply(datamelt, .(year,DOY,ring,dendro), function(x)x[x$diameter==min(x$diameter), ])

Here are my questions: - How is the i) way working when there are several min values? - And more importantly, in the ii) way, when there are several min values, how can I only have the min that happen further in time? For example, image that for dendro #5 in ring #2 there are 3 min values in 3598mm diameter, happening at 13:15, 13:30 and 13:45. I would like to have only the last one (13:45) and remove the others.

> str(dailymin$TIMESTAMP)
 POSIXct[1:12634], format: "2013-05-02 13:45:00" "2013-05-02 08:45:00" "2013-05-02 14:00:00" "2013-05-02 13:45:00" "2013-05-02 14:45:00" ...

Thanks

To your first question: as documented, which.min returns the index of the first minimum encountered in x$diameter .

To your second question: assuming x is already sorted by increasing TIMESTAMP (seems to be the case in your example), you can write your own last.min function to do exactly like which.min does but returns the index of the last minimum:

last.min <- function(x) length(x) - which.min(rev(x)) + 1L
dailymin <- ddply(datamelt, .(year, DOY, ring, dendro),
                  function(x)x[last.min(x$diameter), ])

If your data is not sorted by TIMESTAMP , you could use arrange to sort it by decreasing TIMESTAMP , then use which.min :

dailymin <- ddply(datamelt, .(year, DOY, ring, dendro),
                  function(x) {
                     x <- arrange(x, desc(TIMESTAMP))
                     x[which.min(x$diameter), ])
                  })

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM