[英]Subsetting a dataframe based on daily maxima
I have an excel csv with a date/time column and a value associated with that date/time. 我有一个带有日期/时间列的excel csv以及与该日期/时间相关的值。 I'm trying to write a script that will go through this format (see below), and find 1) the maximum value per day, and 2) the time on that day that the maximum occurs. 我正在尝试编写一个将通过这种格式的脚本(见下文),并找到1)每天的最大值,以及2)当天发生最大值的时间。 Preferably R would return both values to me in a new dataframe. 优选地,R将在新数据帧中将两个值返回给我。
The data looks something like this: 数据看起来像这样:
V1 V2 V3
1 5/1/2012 3:00 1
2 5/1/2012 6:00 2
3 5/1/2012 9:00 5
4 5/1/2012 12:00 3
5 5/1/2012 15:00 6
6 5/1/2012 18:00 2
7 5/1/2012 21:00 1
8 5/2/2012 0:00 2
9 5/2/2012 3:00 3
10 5/2/2012 6:00 6
11 5/2/2012 9:00 4
12 5/2/2012 12:00 6
13 5/2/2012 15:00 7
14 5/2/2012 18:00 9
15 5/2/2012 21:00 1
So the function I'm envisioning would return: 所以我想象的功能将返回:
1 5/1/2012 15:00 6
2 5/2/2012 18:00 9
Any ideas? 有任何想法吗?
A solution using the plyr package, which I find very elegant for problems like this. 使用plyr包的解决方案,我发现这样的问题非常优雅。
dat.str <- ' V1 V2 V3
1 5/1/2012 3:00 1
2 5/1/2012 6:00 2
3 5/1/2012 9:00 5
4 5/1/2012 12:00 3
5 5/1/2012 15:00 6
6 5/1/2012 18:00 2
7 5/1/2012 21:00 1
8 5/2/2012 0:00 2
9 5/2/2012 3:00 3
10 5/2/2012 6:00 6
11 5/2/2012 9:00 4
12 5/2/2012 12:00 6
13 5/2/2012 15:00 7
14 5/2/2012 18:00 9
15 5/2/2012 21:00 1'
dat <- read.table(textConnection(dat.str), row.names=1, header=TRUE)
library(plyr)
ddply(dat, .(V1), function(x){
x[which.max(x$V3), ]
})
If you are dealing with time series data, I suggest you use a time series class like zoo
or xts
如果您正在处理时间序列数据,我建议您使用像zoo
或xts
这样的时间序列类
dat <- read.table(text=" V1 V2 V3
1 5/1/2012 3:00 1
2 5/1/2012 6:00 2
3 5/1/2012 9:00 5
4 5/1/2012 12:00 3
5 5/1/2012 15:00 6
6 5/1/2012 18:00 2
7 5/1/2012 21:00 1
8 5/2/2012 0:00 2
9 5/2/2012 3:00 3
10 5/2/2012 6:00 6
11 5/2/2012 9:00 4
12 5/2/2012 12:00 6
13 5/2/2012 15:00 7
14 5/2/2012 18:00 9
15 5/2/2012 21:00 1", row.names=1, header=TRUE)
require("xts")
# create an xts object
xobj <- xts(dat[, 3], order.by=as.POSIXct(paste(dat[, 1], dat[, 2]), format="%m/%d/%Y %H:%M"))
If you just wanted to get the daily maximums, and you were okay with using the last time of the day as the index, you could use apply.daily
如果您只想获得每日最大值,并且您可以使用当天的最后一次作为索引,则可以使用apply.daily
apply.daily(xobj, max)
# [,1]
#2012-05-01 21:00:00 6
#2012-05-02 21:00:00 9
To keep the timestamps at which it occurs, you could do this 要保留它发生的时间戳,您可以这样做
do.call(rbind, lapply(split(xobj, "days"), function(x) x[which.max(x), ]))
# [,1]
2012-05-01 15:00:00 6
2012-05-02 18:00:00 9
split(xobj, "days")
creates a list with one day's data in each element. split(xobj, "days")
创建一个列表,每个元素中包含一天的数据。
lapply
applies a function to each day; lapply
每天lapply
使用一个函数; the function, in this case, simply returns the max
observation for each day. 在这种情况下,函数只返回每天的max
观察值。 The lapply
call will return a list
of xts objects. lapply
调用将返回xts对象的list
。 To turn it back into a single xts object, use do.call
. 要将其重新转换为单个xts对象,请使用do.call
。
do.call(rbind, X)
constructs a call to rbind using each element of the list. do.call(rbind, X)
使用列表的每个元素构造对rbind的调用。 It is equivalent to rbind(X[[1]], X[[2]], ..., X[[n]])
它相当于rbind(X[[1]], X[[2]], ..., X[[n]])
For another alternative, you could use data.table
: 另一种方法是,您可以使用data.table
:
dat_table <- data.table(dat)
dat_table [ , list(is_max = V3==max(V3), V2, V3), by= 'V1'][which(is_max),][,is_max :=NULL]
EDIT as per @MattDowle's comment 按照@ MattDowle的评论编辑
dat_table[, .SD[which.max(V3)], by=V1]
For an even simpler data.table
solution. 对于更简单的data.table
解决方案。
here you go: 干得好:
dat.str <- ' V1 V2 V3
1 5/1/2012 3:00 1
2 5/1/2012 6:00 2
3 5/1/2012 9:00 5
4 5/1/2012 12:00 3
5 5/1/2012 15:00 6
6 5/1/2012 18:00 2
7 5/1/2012 21:00 1
8 5/2/2012 0:00 2
9 5/2/2012 3:00 3
10 5/2/2012 6:00 6
11 5/2/2012 9:00 4
12 5/2/2012 12:00 6
13 5/2/2012 15:00 7
14 5/2/2012 18:00 9
15 5/2/2012 21:00 1'
dat <- read.table(textConnection(dat.str), row.names=1, header=TRUE)
do.call(rbind,
by(dat, INDICES=dat$V1, FUN=function(x) tail(x[order(x$V3), ], 1)))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.