简体   繁体   English

在R中,我如何按日期拆分数据框

[英]In R how can I split a dataframe by date

I have a dataframe where one column is a date time (chron). 我有一个数据框,其中一列是日期时间(chron)。 I would like to split this dataframe into a list of dataframes split by the date part only. 我想将此数据框拆分为仅按日期部分拆分的数据框列表。 So each dataframe will have all the data for that day. 因此,每个数据帧都将包含当天的所有数据。 I looked at split function but not sure how to use part of a column value? 我查看了拆分功能,但不确定如何使用部分列值?

say you have this data.frame : 说你有这个data.frame

    df <- data.frame(date=rep(seq.POSIXt(as.POSIXct("2010-01-01 15:26"), by="day", length.out=3), each=3), var=rnorm(9))
> df
                 date         var
1 2010-01-01 15:26:00 -0.02814237
2 2010-01-01 15:26:00 -0.26924825
3 2010-01-01 15:26:00 -0.57968310
4 2010-01-02 15:26:00  0.88089757
5 2010-01-02 15:26:00 -0.79954092
6 2010-01-02 15:26:00  1.87145778
7 2010-01-03 15:26:00  0.93234835
8 2010-01-03 15:26:00  1.29130038
9 2010-01-03 15:26:00 -1.09841234

to split by day you just need: 要按天分开你只需要:

 > split(df, as.Date(df$date))
$`2010-01-01`
                 date         var
1 2010-01-01 15:26:00 -0.02814237
2 2010-01-01 15:26:00 -0.26924825
3 2010-01-01 15:26:00 -0.57968310

$`2010-01-02`
                 date        var
4 2010-01-02 15:26:00  0.8808976
5 2010-01-02 15:26:00 -0.7995409
6 2010-01-02 15:26:00  1.8714578

$`2010-01-03`
                 date        var
7 2010-01-03 15:26:00  0.9323484
8 2010-01-03 15:26:00  1.2913004
9 2010-01-03 15:26:00 -1.0984123

EDIT: 编辑:

the above method is consistent with chron datetime object too: 上面的方法也与chron datetime对象一致:

x <- chron(dates = "02/27/92", times = "22:29:56")
> x
[1] (02/27/92 22:29:56)
> as.Date(x)
[1] "1992-02-27"

EDIT 2 编辑2

making sure that as.Date doesn't change your data is crucial, see here: 确保as.Date不会改变您的数据至关重要,请参见此处:

# I'm using "DSTday" to make a sequece of one entire _apparent_ day
x <- rep(seq.POSIXt(as.POSIXct("2010-03-27 00:31"), by="DSTday", length.out=3))
> x
[1] "2010-03-27 00:31:00 GMT" "2010-03-28 00:31:00 GMT" "2010-03-29 00:31:00 BST"
> as.Date(x)
[1] "2010-03-27" "2010-03-28" "2010-03-28"

the third item is in the summer time and as.Date retrieve the actual day, ie minus one hour. 第三项是在夏季时间和as.Date检索实际日期,即减去一小时。 To avoid this: 为了避免这种情况

> as.Date(cut(x, "DSTday"))
[1] "2010-03-27" "2010-03-28" "2010-03-29"

The trick is to create a vector that tells R how to split the data. 诀窍是创建一个向量,告诉R如何分割数据。 So in your example we have a data frame: 所以在你的例子中我们有一个数据框:

dd = data.frame(x = runif(100),data= paste0(1:4, "/05/13"))
##This step will depend on your data structure
dd$date = strptime(dd$data, "%d/%m/%y")

Note that I've made the date column have class POSIXlt `POSIXt`. 请注意,我已经使date列具有POSIXlt类。 This allows easy manipulation of dates. 这样可以轻松操作日期。

Next I'll create the variable I'm going to split on - split_date . 接下来我将创建我要拆分的变量 - split_date Basically, I subtract the minimum date from all other dates and divide by the number of seconds in a day: 基本上,我从所有其他日期中减去最小日期并除以一天中的秒数:

split_date = (dd$date -min(dd$date))/86400

Since this will result in fractions, I'll round down to the nearest day: 由于这将导致分数,我将向下舍入到最近的一天:

split_date = floor(split_date)

Now I use the split function in the standard way: 现在我以标准方式使用split函数:

split_by_day = split(dd, split_date)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM