简体   繁体   English

在迭代for循环时保留对R数据帧的更改

[英]Preserving changes to R data frame while iterating through for loop

I'm new to Stackexchange, so I apologize in advance if I ask this question incorrectly. 我是Stackexchange的新手,所以如果我不正确地提出这个问题,我会提前道歉。

Here is the background. 这是背景。 I am trying to establish recommended planting date for wheat, based on the last day in spring where one could reasonably expect to see at least 10 more inces of rain before the dry summer begins. 我试图根据春季的最后一天建立小麦的推荐种植日期,在干燥的夏季开始之前,人们可以合理地预计会看到至少10个以上的降雨。

I have a dataset that looks like this: 我有一个如下所示的数据集:

    Site   Date Year DayOfYear DayofRun AveTemp MaxTemp MinTemp Precip TotPre
1 EelRiver 1/1/02 2002         1        1    53.6      57      51   1.01     NA
2 EelRiver 1/2/02 2002         2        2    52.5      64      43   1.30     NA
3 EelRiver 1/3/02 2002         3        3    46.6      60      42   0.56     NA
4 EelRiver 1/4/02 2002         4        4    45.7      57      41   0.00     NA
5 EelRiver 1/5/02 2002         5        5    51.0      57      46   0.53     NA
6 EelRiver 1/6/02 2002         6        6    57.9      60      55   1.70     NA

What I want to do is populate the column TotPre with the total precipitation from that date forward up to Aug 1st. 我想要做的是在TotPre列填充从该日期到8月1日的总降雨量。

I know that, ideally, I would avoid explicit loops, but I was stumped by the fact that it seems like I need to calculate a sum on a subset that varies based on what observation I'm working with. 我知道,理想情况下,我会避免使用显式循环,但我感到困惑的是,似乎我需要计算一个基于我正在使用的观察结果而变化的子集的总和。 So, using a for loop, here was how I attempted to do it: 所以,使用for循环,这是我尝试这样做的方式:

eelriverdata <- read.csv(file="EelRiverCamp.csv",head=TRUE,sep=",")

for (i in nrow(eelriverdata)) {

    tempYear <- eelriverdata[i,"Year"]
    AugIndex <- which(eelriverdata[,"Year"]==tempYear & eelriverdata[,"DayOfYear"] == 213)

    if (i < AugIndex) {
        Tot <- sum(eelriverdata[i:AugIndex,"Precip"])
        eelriverdata$TotPre[i] <- Tot
    }

    else {eelriverdata$TotPre[i] <- 0}

}

The problem I faced was that only the last observation in TotPre would end up populated at the end of executing the loop, with the rest of the values remaining NA. 我遇到的问题是,只有TotPre中的最后一个观察结果才会在执行循环结束时填充,剩下的值保持为NA。 Something is happening where either the value is lost or overwritten with each iteration of the for loop. 在for循环的每次迭代中,值丢失或被覆盖的地方都会发生一些事情。 I did some research, but could find anything other than the mysterious info that for loops do "unexpected things" with data frames. 我做了一些研究,但除了神秘的信息之外,还可以找到除了循环用数据框做出“意想不到的事情”的神秘信息。

So, does anyone know: 那么,有谁知道:

a) How to make the changes to the data frame persist through the iterations? a)如何通过迭代继续对数据框进行更改? I would love to know what "unexpected things" I might expect when operating on data frames using loops. 我很想知道在使用循环操作数据帧时我可能会遇到的“意想不到的事情”。

and / or 和/或

b) A more elegant solution. b)更优雅的解决方案。 I struggle to use apply, ddply and the like when doing anything very complex and maybe I can learn from this example. 在做任何非常复杂的事情时,我很难使用apply,ddply等,也许我可以从这个例子中学习。

Thank you! 谢谢!

Jared 贾里德

No need to use a loop here. 这里不需要使用loop

  1. Use ddply/transform to group by year and get a data.frame as a result 使用ddply / transform按年分组并获得data.frame作为结果
  2. and cumsum to compute the cumulative precipitation 和积累来计算累积降水量
  3. rev to get forward 转而前进

You need just change the 5Jan by 1Aug(213 th day): 你只需要改变5Jan 1Aug(第213天):

library(plyr)
ddply(dat,.(Year),transform, 
     TotPrecp= ifelse(DayOfYear > 5, NA,rev(cumsum(Precip))))

here the result: 结果如下:

  Site   Date Year DayOfYear DayofRun AveTemp MaxTemp MinTemp Precip TotPre TotPrecp
1 EelRiver 1/1/02 2002         1        1    53.6      57      51   1.01     NA     5.10
2 EelRiver 1/2/02 2002         2        2    52.5      64      43   1.30     NA     3.40
3 EelRiver 1/3/02 2002         3        3    46.6      60      42   0.56     NA     2.87
4 EelRiver 1/4/02 2002         4        4    45.7      57      41   0.00     NA     2.87
5 EelRiver 1/5/02 2002         5        5    51.0      57      46   0.53     NA     2.31
6 EelRiver 1/6/02 2002         6        6    57.9      60      55   1.70     NA       NA

To answer your question about loop mainly they are dangerous because of their side effect : 要回答你关于循环的问题主要是因为它们的副作用是危险的:

for (i in 1:10) x <- 2             ## create a global variable x
lapply (1:10, function(z) x <- 2)  ## SAFE don't create a gloable variable x

did not check with your code but it should be for (i in 1:nrow(eelriverdata)) { instead of for (i in nrow(eelriverdata)) { 没有检查你的代码,但它应该是for (i in 1:nrow(eelriverdata)) {而不是for (i in nrow(eelriverdata)) {

and below is my version, where you loop only with year not all rows. 以下是我的版本,你只用年份而不是所有行循环。

I am unclear with some of the questions, but try this approach 我不清楚一些问题,但尝试这种方法

try this: 试试这个:

set.seed(5)
tempdf=data.frame(year=rep(2002:2006, each=365), dayofyear=rep(1:365, times=5), prec=runif(365*5), totpre=0)

years=unique(tempdf$year)
for (i in 1:length(years)){
totpreindex<-which(tempdf[,"year"]==years[i] & tempdf[,"dayofyear"]==213)
totpre<-sum(tempdf[tempdf$year==years[i] & tempdf$dayofyear>0  & tempdf$dayofyear<213,"prec"])
tempdf[totpreindex,"totpre"]<-totpre
}

output: 输出:

> tempdf[tempdf$totpre>0,]
     year dayofyear      prec   totpre
213  2002       213 0.4094868 108.9317
578  2003       213 0.2037912 109.2401
943  2004       213 0.3949180 112.0684
1308 2005       213 0.6600369 107.0455
1673 2006       213 0.5524957 102.6835

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM