简体   繁体   中英

Optimizing a loop in R

I am writing this code to go through data and compare values. This is my code:

for (t in 1:(length(prob_times_start_new))){
count3 <- 0
testcount <- 0
dates <- c()
count2 <- 0
for(n in 1:length(ob_times)){
    issue <- substr(prob_times_start_new[t],1,10)
    issue2 <- substr(prob_times_end_new[t],1,10)
    count2 <- count2 + 1
    if (grepl(issue,ob_times[n])|grepl(issue2,ob_times[n])){
        if ((ob_times[n] >= prob_times_start_new[t]) & (ob_times[n] <= prob_times_end_new[t])){
            count3 <- count3 + 1}
        if ((ob_times[n] >= prob_times_start_new[t]) & (ob_times[n] <= prob_times_end_new[t]) & (count3 <= 1)){

            if (probs_new[t] == "PROB30"){
                num_of_hits30 <- num_of_hits30 + 1}
            else if (probs_new[t] == "PROB40"){
                num_of_hits40 <- num_of_hits40 + 1}
            }
        if ((ob_times[n]<prob_times_start_new[t]) | (ob_times[n] > prob_times_end_new[t])){
            testcount <- testcount + 1}
        dates <- c(dates,ob_times[n])
        }

    nums <- length(ob_times)
    if ((!(grepl(issue,ob_times[nums])))&(!(grepl(issue2,ob_times[1])))){

        if (((prob_times_start_new[t]>ob_times[nums])|(prob_times_end_new[t]<ob_times[1]))&count2<=1){

            if (probs_new[t] == "PROB30"){
                num_of_false30 <- num_of_false30 + 1}
            else if (probs_new[t] == "PROB40"){
                num_of_false40 <- num_of_false40 + 1}}}}
if((!(is.null(dates)))){
    if((testcount==length(dates))){

        if (probs_new[t] == "PROB30"){
            num_of_false30 <- num_of_false30 + 1}
        else if (probs_new[t] == "PROB40"){
            num_of_false40 <- num_of_false40 + 1}}}


for (k in 2:length(ob_times)){
    if(((!(grepl(issue,ob_times[k])))&(!(grepl(issue2,ob_times[k]))))&((!(grepl(issue,ob_times[k-1]))) & (!(grepl(issue,ob_times[k-1]))))){
        if ((prob_times_start_new[t]>ob_times[k-1]) & (prob_times_start_new[t]<ob_times[k]) & (prob_times_end_new[t]>ob_times[k-1]) & (prob_times_end_new[t]<ob_times[k])){

            if (probs_new[t] == "PROB30"){
                num_of_false30 <- num_of_false30 + 1}
            else if (probs_new[t] == "PROB40"){
                num_of_false40 <- num_of_false40 + 1}}}}}

prob_times_start_new and prob_times_end_new and ob_times are vectors with strings in this format,

"2010-03-12 22:12:20" (Year-Month-Day Hour:Minute:Second)

probs_new is just a vector with either "PROB30" or "PROB40" num_of_false30, num_of_false40, num_of_hits30, num_of_hits40 are integers starting off at 0 and counting according to the criteria in the code.

I know this is a lot of code and ask questions if you don't understand any of the code. What this is supposed to do is search through a vector and check if anything in ob_times falls between the start and end time interval, if it does it is a hit and if not it is a false.

Right now when I run this code it works but it takes about 2 minutes to do all of this. It would save me a lot of time if I could get this to be faster. I saw some posts about vertorization but I tried doing that myself but was out of luck. If someone could help me out it would be greatly appreciated. Thanks in advance

Allocate the vectors before using them. For example, you have

dates <- c()

Replace that with

dates <- vector('Date', length)

for whatever the length may be. Then, instead of concatenating the dates, access the element

dates[n] <- value

This will give you the most bang for your buck.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM