简体   繁体   中英

Having trouble with in R with nested for loops and an external counter

I'm an R novice with experience in Python and C++ trying to do something that makes sense to me in those languages, but apparently isn't working in R. I've got a JSON array with nested objects that I need to pull data from, but I need to synchronize them into separate arrays to make a new data frame so I can plot the data.

My data looks like this: {URL:[data], ... {VisitHistory:{0:[number], 1:[number]}}}

I'm trying to put this into tabular format, where I get one row for each entry in the VisitHistory array, but each of those rows have the same URL.

Here's what I have so far:

url<-c()
views<-c()
date<-c()
iter<-1

#bring in data
output<-fromJSON(file='filename')

#generate lists for each variable of interest
for(n in 1:length(output)) {
  for(x in 1:length(output[[n]]$th)) {
    url[iter]<-c(output[[n]]$url)
    if(!is.null(output[[n]]$th[[x]]$sh[[1]])) {

      views[iter]<-c(output[[n]]$th[[x]]$sh[[1]])
    }
    else {
      views[iter]<-c(-1)
    }
    date[iter]<-c(output[[n]]$th[[x]]$ts[[1]])

    iter<-iter+1
  }
  iter<-iter+1
}

I'm trying to use iter to make sure that url , views , and date all stay synchronized in their respective vectors until I merge them into their own data frame. However, trying to do assignment in that block with the iter variable as an index makes the loop go on infinitely, and I can't figure out why.

I appreciate your help!

Have you tried printing the iter variable inside the loop to see if it actually going through the iterations or halting on something? Maybe your file is just huge. I am not providing a solution, just a way to help you debug this.

Also, you are dynamically allocating memory inside the for loop for the variables, that makes things slow. Try allocating a fixed size matrix or sequence ( seq or rep ) for the variables in the beginning and break the loops when the iter variable has exhausted their size. If that works, then you know that time is the issue. Eg

# Avoid dynamic allocation, which is slow
# by preallocating memory.
url<-rep(0, 10)
views<-rep(0, 10)
date<-rep(0, 10)
iter<-1

#bring in data
output<-fromJSON(file='filename')

#generate lists for each variable of interest
for(n in 1:length(output)) {
  for(x in 1:length(output[[n]]$th)) {
    print(iter) # print the progression
    url[iter]<-c(output[[n]]$url)
    if(!is.null(output[[n]]$th[[x]]$sh[[1]])) {

      views[iter]<-c(output[[n]]$th[[x]]$sh[[1]])
    }
    else {
      views[iter]<-c(-1)
    }
    date[iter]<-c(output[[n]]$th[[x]]$ts[[1]])

    iter<-iter+1
    if(iter > 10) break
  }
  iter<-iter+1
  if(iter > 10) break
}

You might also want to consider defining a function for what you want and apply that to the list you have, using the plyr package. But first try what I added above and see if that works. Also, to find the max number of iterations for the preallocation, you can do something like:

maxiter <- 0
for(i in 1:length(output)){
  maxiter <- maxiter + length(output[[i]]$th)
}

Also, why are you incrementing the iter variable in the outside loop? You only need to increment it in the inner most loop.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM