简体   繁体   English

汇总R中每日间隔的数据

[英]Aggregate data on daily intervals in R

My dataset is composed of several observations, over 3 columns (time, price and volume), as follows, 我的数据集由3列(时间,价格和数量)中的几个观察值组成,如下所示,

time                price   volume
2017-11-15 9:35:11  301.1   1.1
2017-11-15 9:35:09  300.9   3.0
2017-11-15 9:35:07  300.8   1.4 
2017-11-15 9:35:06  300.9   0.1
2017-11-15 9:35:01  301.0   0.6

I want to start by cut the data by periods of 24h, adding the volume for each period of 24h and obtaining the at the time the data is aggregated. 我想首先将数据按24h的周期进行裁剪,为每个24h的周期添加卷,并在汇总数据时获得。

I have tried by doing the following (the initial dataset is called "mydf" on the code), 我已经尝试执行以下操作(代码上的初始数据集称为“ mydf”),

##sum the volume over periods of 24h
mydf_volume_24h <- data.frame (volume = tapply (cbind (mydf$volume), list (cut (mydf$time, breaks="24 hours")), sum))

##bind the previous df with the prices for each time label
mydf_24h <- setNames (cbind (rownames (mydf_volume_24h), mydf_volume_24h, row.names = NULL), c("time", "volume"))

mydf <- mydf %>% 
select(-volume)

mydf_24h <- merge (mydf, mydf_volume_24h, by = "time")

The problem with this code, besides (probably) being not the best/efficient way, does not result since the first part of the code gives me the the sum of the volume for a period of 24h but labeling each sum with the time 23:00:00, which not always exists on my dataset. 除了(可能)不是最佳/有效方式之外,此代码的问题不会出现,因为该代码的第一部分为我提供了24小时内的体积总和,但用时间23标记了每个总和: 00:00,这并不总是存在于我的数据集中。

What I entended is to cut over 24h periods but giving me the (real) time of an observation which is the closest to the period of 24h. 我的意图是减少24小时的时间段,但给我一个最接近24小时时间段的(实际)观察时间。 Is there any way to do this? 有什么办法吗?

This may not be exactly what you want, but from your description I gathered that you want to sum the volume for each unique day, along with getting the max time for each unique day. 这可能不完全是您想要的,但是根据您的描述,我收集到了您想要对每一天的总量进行汇总,以及获取每一天的最长时间。 If that is indeed what you want the below should work to get your aggregate data frame: 如果确实如此,则下面的方法应该可以获取汇总数据框:

library(dplyr)
library(stringr)
library(lubridate)

df <- tibble(time = c(
             "2017-11-15 9:35:11",
             "2017-11-15 9:35:09",
             "2017-11-15 9:35:07",
             "2017-11-15 9:35:06",
             "2017-11-15 9:35:01",
             "2017-11-16 9:36:12",
             "2017-11-16 9:35:09",
             "2017-11-16 9:35:07",
             "2017-11-16 9:35:06",
             "2017-11-16 9:35:01"
             ),
             price = c(301.1, 300.9, 300.8, 300.9, 301.0,
                       302, 303, 304, 305, 306),
             volume = c(1.1, 3.0, 1.4, 0.1, 0.6,
                        1.4, 3.4, 1.5, 0.5, 0.6)
)

df %>% mutate(time = ymd_hms(time)) %>% 
        mutate(day = str_extract(time, "^\\S+"))  %>% 
        group_by(day) %>% 
        summarize(volume = sum(volume), maxTime = max(time))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM