简体   繁体   English

如何基于条件进行迭代,以及将聚合值分配给R中新数据帧中的行

[英]how to iterate based on a condition, and assign aggregated value to a row in new dataframe in R

I have a large dataset of stock prices with 203615 rows and 2 columns(price and Timestamp). 我有一个包含203615行和2列(价格和时间戳记)的大型股票价格数据集。 in below format 以下格式

price(USD) | 价格(美元)| Timestamp 时间戳

3.5 | 3.5 | 2014-01-01 20:00:00 2014-01-01 20:00:00

2 | 2 | 2014-01-01 20:15:00 2014-01-01 20:15:00

5 | 5 | 2014-01-01 20:15:00 2014-01-01 20:15:00

---- ----


4 | 4 | 2014-01-31 23:00:00 2014-01-31 23:00:00

5 | 5 | 2014-01-31 23:00:00 2014-01-31 23:00:00

4.5 | 4.5 | 2014-01-31 23:00:00 2014-01-31 23:00:00

203615 2.3 | 203615 2.3 | 2014-01-31 23:00:00 2014-01-31 23:00:00

Time stamp varies from "2014-01-01 20:00:00" to "2014-01-31 23:00:00" with intervals of 15min(rounded to 15min). 时间戳记从“ 2014-01-01 20:00:00”到“ 2014-01-31 23:00:00”不等,间隔为15分钟(四舍五入为15分钟)。 i have several transactions on same timestamp. 我在同一时间戳上有几笔交易。 I have to group rows based on timestamp with difference of one day, and caluclate min,max and mean of the price and no of rows within the timestamp limits and assign them to a row in a new dataframe for every iteration until it reaches the end timestamp("2014-01-31 23:00:00") from starting date('2014-01-02 20:00:00") note: iteration has to be done for every 15min 我必须根据具有一天差异的时间戳将行分组,并计算价格的最小值,最大值和均值,以及时间戳限制内没有行,并为每次迭代将它们分配到新数据框中的一行,直到到达终点从开始日期('2014-01-02 20:00:00“)开始的timestamp(” 2014-01-31 23:00:00“)注意:必须每15分钟执行一次迭代

i have tried while loop. 我尝试了while循环。 please help me with this and suggest me if i can use any packages 请帮助我,并建议我是否可以使用任何包装

This is my own code which I used as a way of creating a window of time (the prior 24 hours) to iterate over and create min and max values for a project I am working on... inter is the inteval I worked on in the loop raw is the data frame name i is the specific row from which the datetime column was selected from raw 这是我自己的代码,用作创建时间窗口(之前的24小时)以迭代并为我正在处理的项目创建最小值和最大值的方法…… inter是我从事的时间间隔循环raw是数据帧名称i是从raw中选择datetime列的特定行

I started my intervals at 97th row ( (i in 97:nrow(raw) ) because the stamps were taken at 15 minute intervals and I wanted a 24 hour backward window, so I needed to leave 96 intervals to pull from...I could not reach back into time I had no data for...so I started far enough into my data to leave room for those intervals. 我从第97行开始间隔( (i in 97:nrow(raw) ),因为邮票以15分钟为间隔拍摄,并且我想向后看24小时,所以我需要离开96个间隔才能从...拉出无法恢复到我没有数据的时间...所以我开始深入数据以为那些间隔留出空间。

for (i in 97:nrow(raw)){ inter=raw$datetime[i] - as.difftime(24, unit='hours') raw$deltaAirTemp_24[i] <-max(temp$Air.Temperature)- min(temp$Air.Temperature) }

The key is getting into a real date time format. 关键是进入实时日期时间格式。 Run str() on the field with the dates, if the come back as anything but Factor, use: 在带有日期的字段上运行str(),如果返回值不是Factor,则使用:

as.POSIXct(yourdate$field, %Y-%m-%d %H:%M:%S)

If they come back from str(yourdatecolumn here) as FACTOR then wrap it in as.POSIXct(as.character(yourdate$field), %Y-%m-%d %H:%M:%S) to be sure it does not coerce the date into a Level number then time.. 如果它们以FACTOR形式从str(yourdatecolumn这里)返回,则将其包装为as.POSIXct(as.character(yourdate$field), %Y-%m-%d %H:%M:%S)以确保它不会将日期强制转换为级别编号,然后是时间。

Get them into a consistent date format, then construct something like above to extract the periods you need. 将它们设置为一致的日期格式,然后构造类似上面的内容以提取所需的期间。 difftime is in the base package and works well you can use positive and negative intervals with it. difftime在基本软件包中,并且可以很好地使用正负间隔。 I hope his helps! 希望他能帮上忙!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM