简体   繁体   中英

Replace loop with more efficient solution

So my Data looks like this:

test <- structure(list(value = c(0, 781, 1109, 57, 250, 541, 533, 320, 
322, 1033, 291, 2213, 1845, 618, 271, 525, 88, 1354, 217, 820, 
786, 119, 41, 316, 153, 378, 172, 615, 383, 168, 1448, 824, 85, 
224310, 1186, 1488, 244, 368, 133, 488, 118, 4505, 1411, 649, 
690, 548, 226, 393, 1042, 92, 521, 212, 1015, 380, 2944, 54376, 
1396, 429, 2725, 171, 1874, 87, 547, 488, 140, 169, 237, 1749, 
1144, 156, 843, 116, 313, 601, 679, 464, 1092, 178, 28, 57, 550, 
498, 64, 48143, 352, 4100, 232, 1936, 189, 940, 180, 1051, 2917, 
2397, 229, 802, 540, 297, 505, 1649), count = c(1L, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 2L, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, 3L, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 4L, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
)), row.names = c(NA, -100L), class = c("tbl_df", "tbl", "data.frame"
))

column value has some random values and column count is mostly filled with NA s. What I need in the end is that every NA in count should be the same as the last one that was not NA . So the first couple of rows should be count == 1 and as soon as count changes to 2 it should be count == 2 . So far I am using a loop

for (i in 1:length(test$value))
{
  if(isTRUE(is.na(test$count[i]))){
    test$count[i] <- test$count[i-1]
  }
}

However, this takes forever? Can anyone think of a more efficient way to get the same result as the loop? This would help me out a lot! Thanks in advance!

You can use fill from the tidyr package to do exactly this:

tidyr::fill(test, count)
#> # A tibble: 100 x 2
#>    value count
#>    <dbl> <int>
#>  1     0     1
#>  2   781     1
#>  3  1109     1
#>  4    57     1
#>  5   250     1
#>  6   541     1
#>  7   533     1
#>  8   320     1
#>  9   322     1
#> 10  1033     1
#> # ... with 90 more rows

You can also use na.locf() from zoo :

library(zoo)
#Code
test$count <- na.locf(test$count)

Output:

# A tibble: 100 x 2
   value count
   <dbl> <int>
 1     0     1
 2   781     1
 3  1109     1
 4    57     1
 5   250     1
 6   541     1
 7   533     1
 8   320     1
 9   322     1
10  1033     1
# ... with 90 more rows

We can also use

library(zoo)
transform(test, count = na.locf0(count))

Or using data.table nafill for an efficient version

 library(data.table)
 setDT(test)[, count:= nafill(count, type = 'locf')]

-output

test
#      value count
#  1:      0     1
#  2:    781     1
#  3:   1109     1
#  4:     57     1
#  5:    250     1
#  6:    541     1
#  7:    533     1
#  8:    320     1
#  9:    322     1
# 10:   1033     1
# 11:    291     1
# 12:   2213     1
# 13:   1845     1
# 14:    618     1
# ..

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM