I'm new to Stackexchange, so I apologize in advance if I ask this question incorrectly.
Here is the background. I am trying to establish recommended planting date for wheat, based on the last day in spring where one could reasonably expect to see at least 10 more inces of rain before the dry summer begins.
I have a dataset that looks like this:
Site Date Year DayOfYear DayofRun AveTemp MaxTemp MinTemp Precip TotPre
1 EelRiver 1/1/02 2002 1 1 53.6 57 51 1.01 NA
2 EelRiver 1/2/02 2002 2 2 52.5 64 43 1.30 NA
3 EelRiver 1/3/02 2002 3 3 46.6 60 42 0.56 NA
4 EelRiver 1/4/02 2002 4 4 45.7 57 41 0.00 NA
5 EelRiver 1/5/02 2002 5 5 51.0 57 46 0.53 NA
6 EelRiver 1/6/02 2002 6 6 57.9 60 55 1.70 NA
What I want to do is populate the column TotPre with the total precipitation from that date forward up to Aug 1st.
I know that, ideally, I would avoid explicit loops, but I was stumped by the fact that it seems like I need to calculate a sum on a subset that varies based on what observation I'm working with. So, using a for loop, here was how I attempted to do it:
eelriverdata <- read.csv(file="EelRiverCamp.csv",head=TRUE,sep=",")
for (i in nrow(eelriverdata)) {
tempYear <- eelriverdata[i,"Year"]
AugIndex <- which(eelriverdata[,"Year"]==tempYear & eelriverdata[,"DayOfYear"] == 213)
if (i < AugIndex) {
Tot <- sum(eelriverdata[i:AugIndex,"Precip"])
eelriverdata$TotPre[i] <- Tot
}
else {eelriverdata$TotPre[i] <- 0}
}
The problem I faced was that only the last observation in TotPre would end up populated at the end of executing the loop, with the rest of the values remaining NA. Something is happening where either the value is lost or overwritten with each iteration of the for loop. I did some research, but could find anything other than the mysterious info that for loops do "unexpected things" with data frames.
So, does anyone know:
a) How to make the changes to the data frame persist through the iterations? I would love to know what "unexpected things" I might expect when operating on data frames using loops.
and / or
b) A more elegant solution. I struggle to use apply, ddply and the like when doing anything very complex and maybe I can learn from this example.
Thank you!
Jared
No need to use a loop
here.
You need just change the 5Jan by 1Aug(213 th day):
library(plyr)
ddply(dat,.(Year),transform,
TotPrecp= ifelse(DayOfYear > 5, NA,rev(cumsum(Precip))))
here the result:
Site Date Year DayOfYear DayofRun AveTemp MaxTemp MinTemp Precip TotPre TotPrecp
1 EelRiver 1/1/02 2002 1 1 53.6 57 51 1.01 NA 5.10
2 EelRiver 1/2/02 2002 2 2 52.5 64 43 1.30 NA 3.40
3 EelRiver 1/3/02 2002 3 3 46.6 60 42 0.56 NA 2.87
4 EelRiver 1/4/02 2002 4 4 45.7 57 41 0.00 NA 2.87
5 EelRiver 1/5/02 2002 5 5 51.0 57 46 0.53 NA 2.31
6 EelRiver 1/6/02 2002 6 6 57.9 60 55 1.70 NA NA
To answer your question about loop mainly they are dangerous because of their side effect :
for (i in 1:10) x <- 2 ## create a global variable x
lapply (1:10, function(z) x <- 2) ## SAFE don't create a gloable variable x
did not check with your code but it should be for (i in 1:nrow(eelriverdata)) {
instead of for (i in nrow(eelriverdata)) {
and below is my version, where you loop only with year not all rows.
I am unclear with some of the questions, but try this approach
try this:
set.seed(5)
tempdf=data.frame(year=rep(2002:2006, each=365), dayofyear=rep(1:365, times=5), prec=runif(365*5), totpre=0)
years=unique(tempdf$year)
for (i in 1:length(years)){
totpreindex<-which(tempdf[,"year"]==years[i] & tempdf[,"dayofyear"]==213)
totpre<-sum(tempdf[tempdf$year==years[i] & tempdf$dayofyear>0 & tempdf$dayofyear<213,"prec"])
tempdf[totpreindex,"totpre"]<-totpre
}
output:
> tempdf[tempdf$totpre>0,]
year dayofyear prec totpre
213 2002 213 0.4094868 108.9317
578 2003 213 0.2037912 109.2401
943 2004 213 0.3949180 112.0684
1308 2005 213 0.6600369 107.0455
1673 2006 213 0.5524957 102.6835
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.