简体   繁体   中英

R - Error in For Loops and If Statements on list of Data Frames: Subscript Out of Bounds

I'm using R to create an occupancy model encounter history. I need to take a list of bird counts for individual leks, separate them by year, then code the count dates into two intervals, either within 10 days of the first count (Interval 1), or after 10 days after the first count (Interval 2). For any year where only 1 count occurred I need to add an entry coded as "U", to indicate that no count occurred during the second interval. Following that I need to subset out only the max count in each year and interval. A sample dataset:

 ComplexId       Date Males Year category
        57 1941-04-15    97 1941        A
        57 1942-04-15    67 1942        A
        57 1943-04-15    44 1943        A
        57 1944-04-15    32 1944        A
        57 1946-04-15    21 1946        A
        57 1947-04-15    45 1947        A
        57 1948-04-15    67 1948        A
        57 1989-03-21    25 1989        A
        57 1989-03-30    41 1989        A
        57 1989-04-13     2 1989        A
        57 1991-03-06    35 1991        A
        57 1991-04-04    43 1991        A
        57 1991-04-11    37 1991        A
        57 1991-04-22    25 1991        A
        57 1993-03-23     6 1993        A
        57 1994-03-06    17 1994        A
        57 1994-03-11    10 1994        A
        57 1994-04-06    36 1994        A
        57 1994-04-15    29 1994        A
        57 1994-04-21    27 1994        A

Now here is the code I wrote to accomplish my task, naming the dataframe above "c1" (you'll need to coerce the date column to date, and the category column to character):

c1_Year<-lapply(unique(c1$Year), function(x) c1[c1$Year == x,]) #splits complex counts into list by year

for(i in 1:length(c1_Year)){
  c1_Year[[i]]<-cbind(c1_Year[[i]], daydiff = as.numeric(c1_Year[[i]][,2]-c1_Year[[i]][1,2]))
} #adds column with difference between first survey and subsequent surveys

for(i in 1:length(c1_Year)){
  c1_Year[[i]]<-if(length(c1_Year[[i]][,1]) == 1)
    rbind(c1_Year[[i]], c(c1_Year[[i]][1,1], NA, 0, c1_Year[[i]][1,4], "U", 11)) 
} # adds U values to years with only 1 count,  while coercing the "u" into the appropriate interval

for(i in 1:length(c1_Year)){
  c1_Year[[i]]$Interval<- ifelse(c1_Year[[i]][,6] < 10, 1, 2)
} # adds interval code for each survey, 1 = less than ten days after first count, 2 = more than 2 days after count

for(i in 1:length(c1_Year)){
  c1_Year[[i]]<-ddply(.data=c1_Year[[i]], .(Interval), subset, Males==max(Males)) 
} # subsets out max count in each interval

The problem arises during the second for-loop, which when options(error=recover) is enable returns: Error in c1_Year[[i]] : subscript out of bounds No suitable frames for recover() ` At that point the code accomplishes what it was supposed to and adds the extra line to each year with only one count, even though the error message is generated the extra rows with the "U" code are still appended to the data frames. The issue is that I have 750 leks to do this for. So I tried to build the code above into a function, however when I run the function on any data the subscript out of bounds error stops the function from running. I could brute force it and just run the code above for each lek manually, but I was hoping there might be a more elegant solution. What I need to know is why am I getting the subscript out of bounds error, and how can I fix it?

Here's the function I wrote, so that you can see that it doesn't work:

create.OEH<-function(dataset, final_dataframe){
  c1_Year<-lapply(unique(dataset$Year), function(x) dataset[dataset$Year == x,]) #splits complex counts into list by year

  for(i in 1:length(c1_Year)){
    c1_Year[[i]]<-cbind(c1_Year[[i]], daydiff = as.numeric(c1_Year[[i]][,2]-c1_Year[[i]][1,2]))
  } #adds column with difference between first survey and subsequent surveys

  for(i in 1:length(c1_Year)){
    c1_Year[[i]]<-if(length(c1_Year[[i]][,1]) == 1)
      rbind(c1_Year[[i]], c(c1_Year[[i]][1,1], NA, 0, c1_Year[[i]][1,4], "U", 11)) 
  } # adds U values to years with only 1 count,

  for(i in 1:length(c1_Year)){
    c1_Year[[i]]$Interval<- ifelse(c1_Year[[i]][,6] < 10, 1, 2)
  } # adds interval code for each survey, 1 = less than ten days after first count, 2 = more than 2 days after count

  for(i in 1:length(c1_Year)){
    c1_Year[[i]]<-ddply(.data=c1_Year[[i]], .(Interval), subset, Males==max(Males)) 
  } #subset out max count for each interval

  df<-rbind.fill(c1_Year) #collapse list into single dataframe

  final_dataframe<-df[!duplicated(df[,c("Year", "Interval")]),] #remove ties for max count

}

In this bit of code

for(i in 1:length(c1_Year)){
    c1_Year[[i]]<-if(length(c1_Year[[i]][,1]) == 1)
      rbind(c1_Year[[i]], c(c1_Year[[i]][1,1], NA, 0, c1_Year[[i]][1,4], "U", 11)) 
  } 

You are assigning NULL if length(c1_Year[[i]][,1]==1 is not true, which removes those elements from c1_Year entirely.

You probably want

for(i in 1:length(c1_Year)){
    if (length(c1_Year[[i]][,1]) == 1) {
        c1_Year[[i]] <- rbind(c1_Year[[i]], c(c1_Year[[i]][1,1], NA, 0, c1_Year[[i]][1,4], "U", 11)) 
    }
  } 

However, I see you are already using ddply , so you may be able to avoid a lot of your replication. The ddply(c1, .(Year), ...) splits up c1 into unique years.

c2 <- ddply(c1,
            .(Year),
            function (x) {
                # create 'Interval'
                x$Interval <- ifelse(x$Date - x$Date[1] < 10, 1, 2)
                # extract max males per interval
                o <- ddply(x, .(Interval), subset, Males==max(Males))
                # add the 'U' col if no '2' interval
                if (all(o$Interval != 2)) {
                    o <- rbind(o,
                               list(o$ComplexId, NA, 0, o$Year, 'U', 2))
                }
                # return the resulting dataframe
                o
            })

I converted your rbind(.., c(...)) to rbind(.., list(...)) to avoid converting everything back to string (which is what the c does because it cannot handle multiple different types).

Otherwise the code is almost the same as yours.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM