简体   繁体   中英

Preserving changes to R data frame while iterating through for loop

I'm new to Stackexchange, so I apologize in advance if I ask this question incorrectly.

Here is the background. I am trying to establish recommended planting date for wheat, based on the last day in spring where one could reasonably expect to see at least 10 more inces of rain before the dry summer begins.

I have a dataset that looks like this:

    Site   Date Year DayOfYear DayofRun AveTemp MaxTemp MinTemp Precip TotPre
1 EelRiver 1/1/02 2002         1        1    53.6      57      51   1.01     NA
2 EelRiver 1/2/02 2002         2        2    52.5      64      43   1.30     NA
3 EelRiver 1/3/02 2002         3        3    46.6      60      42   0.56     NA
4 EelRiver 1/4/02 2002         4        4    45.7      57      41   0.00     NA
5 EelRiver 1/5/02 2002         5        5    51.0      57      46   0.53     NA
6 EelRiver 1/6/02 2002         6        6    57.9      60      55   1.70     NA

What I want to do is populate the column TotPre with the total precipitation from that date forward up to Aug 1st.

I know that, ideally, I would avoid explicit loops, but I was stumped by the fact that it seems like I need to calculate a sum on a subset that varies based on what observation I'm working with. So, using a for loop, here was how I attempted to do it:

eelriverdata <- read.csv(file="EelRiverCamp.csv",head=TRUE,sep=",")

for (i in nrow(eelriverdata)) {

    tempYear <- eelriverdata[i,"Year"]
    AugIndex <- which(eelriverdata[,"Year"]==tempYear & eelriverdata[,"DayOfYear"] == 213)

    if (i < AugIndex) {
        Tot <- sum(eelriverdata[i:AugIndex,"Precip"])
        eelriverdata$TotPre[i] <- Tot
    }

    else {eelriverdata$TotPre[i] <- 0}

}

The problem I faced was that only the last observation in TotPre would end up populated at the end of executing the loop, with the rest of the values remaining NA. Something is happening where either the value is lost or overwritten with each iteration of the for loop. I did some research, but could find anything other than the mysterious info that for loops do "unexpected things" with data frames.

So, does anyone know:

a) How to make the changes to the data frame persist through the iterations? I would love to know what "unexpected things" I might expect when operating on data frames using loops.

and / or

b) A more elegant solution. I struggle to use apply, ddply and the like when doing anything very complex and maybe I can learn from this example.

Thank you!

Jared

No need to use a loop here.

  1. Use ddply/transform to group by year and get a data.frame as a result
  2. and cumsum to compute the cumulative precipitation
  3. rev to get forward

You need just change the 5Jan by 1Aug(213 th day):

library(plyr)
ddply(dat,.(Year),transform, 
     TotPrecp= ifelse(DayOfYear > 5, NA,rev(cumsum(Precip))))

here the result:

  Site   Date Year DayOfYear DayofRun AveTemp MaxTemp MinTemp Precip TotPre TotPrecp
1 EelRiver 1/1/02 2002         1        1    53.6      57      51   1.01     NA     5.10
2 EelRiver 1/2/02 2002         2        2    52.5      64      43   1.30     NA     3.40
3 EelRiver 1/3/02 2002         3        3    46.6      60      42   0.56     NA     2.87
4 EelRiver 1/4/02 2002         4        4    45.7      57      41   0.00     NA     2.87
5 EelRiver 1/5/02 2002         5        5    51.0      57      46   0.53     NA     2.31
6 EelRiver 1/6/02 2002         6        6    57.9      60      55   1.70     NA       NA

To answer your question about loop mainly they are dangerous because of their side effect :

for (i in 1:10) x <- 2             ## create a global variable x
lapply (1:10, function(z) x <- 2)  ## SAFE don't create a gloable variable x

did not check with your code but it should be for (i in 1:nrow(eelriverdata)) { instead of for (i in nrow(eelriverdata)) {

and below is my version, where you loop only with year not all rows.

I am unclear with some of the questions, but try this approach

try this:

set.seed(5)
tempdf=data.frame(year=rep(2002:2006, each=365), dayofyear=rep(1:365, times=5), prec=runif(365*5), totpre=0)

years=unique(tempdf$year)
for (i in 1:length(years)){
totpreindex<-which(tempdf[,"year"]==years[i] & tempdf[,"dayofyear"]==213)
totpre<-sum(tempdf[tempdf$year==years[i] & tempdf$dayofyear>0  & tempdf$dayofyear<213,"prec"])
tempdf[totpreindex,"totpre"]<-totpre
}

output:

> tempdf[tempdf$totpre>0,]
     year dayofyear      prec   totpre
213  2002       213 0.4094868 108.9317
578  2003       213 0.2037912 109.2401
943  2004       213 0.3949180 112.0684
1308 2005       213 0.6600369 107.0455
1673 2006       213 0.5524957 102.6835

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM