Firstly apologies if this has been answered else where I have searched all over for this but cannot find the answer. My problem may be due to the way I am searching for info and so I decided to use StackOverflow so I can present my problem with examples.
I have five minute OHLC data DIA_5.csv which I have then added a DayOfYear column to using Lubridate;
library(lubridate)
DIA_5[,6]<- yday(DIA_5[,1])
Which looks like this;
Date Open High Low Close DOY
1 2015-09-21 09:30:00 164.6700 164.7100 164.3700 164.5300 264
2 2015-09-21 09:35:00 164.5300 164.9000 164.5300 164.6400 264
3 2015-09-21 09:40:00 164.6600 164.8900 164.6000 164.8900 264
4 2015-09-21 09:45:00 164.9100 165.0900 164.9100 164.9736 264
5 2015-09-21 09:50:00 164.9399 165.0980 164.8200 164.8200 264
What I wanted to do was create a new d$f with the first column starting with the individual day of year numbers, then I would fill this new d$f by subsetting data from the original OHLC d$f using the day of year number. The aim of this is so that in the new d$f I can extract the MAX value from all the highs on day x and bring it to column on the new d$f and so on with the other variables. The closest I could get to this goal was by using the following code however, this returns me all the values from the OHLC and I cannot find away to change this so that only the day of year number is brought across to the new d$f.
DF<-DIA_5[match(unique(DIA_5[,6]), DIA_5[,6]),]
row.names DATE OPEN HIGH LOW CLOSE DOY
1 1 2015-09-21 09:30:00 164.67 164.7100 164.370 164.5300 264
2 79 2015-09-22 09:30:00 162.62 162.9600 162.620 162.7544 265
3 157 2015-09-23 09:30:00 163.26 163.3800 162.980 163.1400 266
4 235 2015-09-24 09:30:00 161.12 161.3700 161.060 161.2300 267
5 313 2015-09-25 09:30:00 163.81 163.9100 163.570 163.5800 268
Despite having more data than needed using the above code I decided to try subsetting the data. So from the above I wanted in the row next to 264 to use this value as a filter on the main OHLC d$f then extract the highest value in the column of highs. Using
DF[,6] <- max(subset(DIA_5[,3], yday(DIA_5[,1]) == DF[,6] ))
gave me
Warning message:
In yday(DIA_5[, 1]) == DF[, 6] :
longer object length is not a multiple of shorter object length
It did give a new column on d$f but this had the same value repeated.
row.names DATE OPEN HIGH LOW CLOSE DOY
1 1 2015-09-21 09:30:00 164.67 164.7100 164.370 164.5300 179.02
2 79 2015-09-22 09:30:00 162.62 162.9600 162.620 162.7544 179.02
3 157 2015-09-23 09:30:00 163.26 163.3800 162.980 163.1400 179.02
4 235 2015-09-24 09:30:00 161.12 161.3700 161.060 161.2300 179.02
5 313 2015-09-25 09:30:00 163.81 163.9100 163.570 163.5800 179.02
6 391 2015-09-28 09:30:00 162.04 162.0600 161.660 161.7100 179.02
I tried using my subset syntax to pull the max high value from a random DOY number and it seems to work fine;
h <- max(subset(DIA_5[,3], yday(DIA_5[,1]) == DF[1,6] ))
But I just cannot find out how to do this so that it creates a new column of the MAX
value in the high column on x
day of year.
Any help with this would be very much appreciated.
You can use dplyr
.
I created some fake data which looks like this and stored it in df
:
Date Open High Low Close DOY
1 2015-09-21 164.6700 164.710 164.37 164.5300 264
2 2015-09-21 164.5300 164.900 164.53 164.6400 264
3 2015-09-21 164.6600 164.890 164.60 164.8900 264
4 2015-09-22 164.9100 165.090 164.91 164.9736 265
5 2015-09-22 164.9399 165.098 164.82 164.8200 265
6 2015-09-22 162.6200 162.960 162.62 162.7544 265
7 2015-09-23 163.2600 163.380 162.98 163.1400 266
8 2015-09-23 161.1200 161.370 161.06 161.2300 266
9 2015-09-23 163.8100 163.910 163.57 163.5800 266
library(dplyr)
x <- df %>%
group_by(DOY) %>%
filter(High == max(High)) %>%
as.data.frame()
x
Date Open High Low Close DOY
1 2015-09-21 164.5300 164.900 164.53 164.64 264
2 2015-09-22 164.9399 165.098 164.82 164.82 265
3 2015-09-23 163.8100 163.910 163.57 163.58 266
aggregate
is a fine 'one liner' for this
#simulate some time series and place in data.frame
set.seed(1)
d = data.frame(replicate(5,cumsum(rnorm(2000))))
d$doy = sort(sample(1:364,2000,replace=T))
print(d[d$doy==1,])
X1 X2 X3 X4 X5 doy
1 -0.6264538 -0.88614959 -1.1346302 -0.6188271 0.2637034 1
2 -0.4428105 -2.80840448 -0.3700731 -1.7282490 -0.5657484 1
3 -1.2784391 -1.18870374 0.2006371 -3.8985843 -2.0273832 1
4 0.3168417 -0.66943383 -1.1510569 -3.9298873 -0.3433930 1
5 0.6463495 -0.72528376 -3.1809423 -4.1902858 -1.8877173 1
6 -0.1741189 -0.02886615 -2.5904637 -3.6558553 -2.0786045 1
7 0.3133101 0.02464952 -4.0035337 -4.2152947 -1.0623928 1
8 1.0516348 -1.28563397 -2.3931921 -2.6069245 -0.5152666 1
9 1.6274162 -3.40870003 -0.5527496 -2.0502848 0.2398875 1
#aggregate data by DOY and compute some statistics for each column
maxPerDOY.df = aggregate(d[1:5],list(doy=d$doy),max)
print(head(maxPerDOY.df,3))
doy X1 X2 X3 X4 X5
1 1 1.627416 0.02464952 0.2006371 -0.6188271 0.2637034
2 2 3.223652 -2.76920768 0.8155484 -1.8646623 2.1378466
3 3 3.216576 -3.39431265 -0.8062283 -0.6656144 2.9014736
Using the advice given Teja KI managed to code all the subletting that was required for my project. dplyr is an excellent package and designed exactly for this. They syntax is also incredibly easy for noobs like me. Thanks for all the hall guys.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.