Optimizing a loop in R

Question

I am writing this code to go through data and compare values. This is my code:

for (t in 1:(length(prob_times_start_new))){
count3 <- 0
testcount <- 0
dates <- c()
count2 <- 0
for(n in 1:length(ob_times)){
    issue <- substr(prob_times_start_new[t],1,10)
    issue2 <- substr(prob_times_end_new[t],1,10)
    count2 <- count2 + 1
    if (grepl(issue,ob_times[n])|grepl(issue2,ob_times[n])){
        if ((ob_times[n] >= prob_times_start_new[t]) & (ob_times[n] <= prob_times_end_new[t])){
            count3 <- count3 + 1}
        if ((ob_times[n] >= prob_times_start_new[t]) & (ob_times[n] <= prob_times_end_new[t]) & (count3 <= 1)){

            if (probs_new[t] == "PROB30"){
                num_of_hits30 <- num_of_hits30 + 1}
            else if (probs_new[t] == "PROB40"){
                num_of_hits40 <- num_of_hits40 + 1}
            }
        if ((ob_times[n]<prob_times_start_new[t]) | (ob_times[n] > prob_times_end_new[t])){
            testcount <- testcount + 1}
        dates <- c(dates,ob_times[n])
        }

    nums <- length(ob_times)
    if ((!(grepl(issue,ob_times[nums])))&(!(grepl(issue2,ob_times[1])))){

        if (((prob_times_start_new[t]>ob_times[nums])|(prob_times_end_new[t]<ob_times[1]))&count2<=1){

            if (probs_new[t] == "PROB30"){
                num_of_false30 <- num_of_false30 + 1}
            else if (probs_new[t] == "PROB40"){
                num_of_false40 <- num_of_false40 + 1}}}}
if((!(is.null(dates)))){
    if((testcount==length(dates))){

        if (probs_new[t] == "PROB30"){
            num_of_false30 <- num_of_false30 + 1}
        else if (probs_new[t] == "PROB40"){
            num_of_false40 <- num_of_false40 + 1}}}


for (k in 2:length(ob_times)){
    if(((!(grepl(issue,ob_times[k])))&(!(grepl(issue2,ob_times[k]))))&((!(grepl(issue,ob_times[k-1]))) & (!(grepl(issue,ob_times[k-1]))))){
        if ((prob_times_start_new[t]>ob_times[k-1]) & (prob_times_start_new[t]<ob_times[k]) & (prob_times_end_new[t]>ob_times[k-1]) & (prob_times_end_new[t]<ob_times[k])){

            if (probs_new[t] == "PROB30"){
                num_of_false30 <- num_of_false30 + 1}
            else if (probs_new[t] == "PROB40"){
                num_of_false40 <- num_of_false40 + 1}}}}}

prob_times_start_new and prob_times_end_new and ob_times are vectors with strings in this format,

"2010-03-12 22:12:20" (Year-Month-Day Hour:Minute:Second)

probs_new is just a vector with either "PROB30" or "PROB40" num_of_false30, num_of_false40, num_of_hits30, num_of_hits40 are integers starting off at 0 and counting according to the criteria in the code.

I know this is a lot of code and ask questions if you don't understand any of the code. What this is supposed to do is search through a vector and check if anything in ob_times falls between the start and end time interval, if it does it is a hit and if not it is a false.

Right now when I run this code it works but it takes about 2 minutes to do all of this. It would save me a lot of time if I could get this to be faster. I saw some posts about vertorization but I tried doing that myself but was out of luck. If someone could help me out it would be greatly appreciated. Thanks in advance

Answer 1

Allocate the vectors before using them. For example, you have

dates <- c()

Replace that with

dates <- vector('Date', length)

for whatever the length may be. Then, instead of concatenating the dates, access the element

dates[n] <- value

This will give you the most bang for your buck.

Optimizing a loop in R

Question

1 answers

solution1
1 2014-02-13 19:13:22

Optimizing a loop in R

Question

1 answers

solution1 1 2014-02-13 19:13:22

solution1
1 2014-02-13 19:13:22