Sample data:
year <- rep(1980:2015, each = 365)
doy <- rep(1:365, times = 36)
set.seed(125)
val <- sample(0:1, size = 365*36,replace = TRUE)
mat <- as.matrix(cbind(year,doy,val))
mat
is data with years, doy and a value column which is either 1 or 0.
I have another matrix mat1
.
set.seed(123)
mat1 <- apply(matrix(sample(c(230:365), replace = TRUE, size = 4L * 36L), nrow = 36L), 2L, sort)
mat1 <- t(apply(mat1, 1, function(x) x[order(x)]))
head(mat1)
[,1] [,2] [,3] [,4]
[1,] 230 231 233 236
[2,] 235 238 242 242
[3,] 236 242 243 246
[4,] 243 245 247 249
[5,] 247 248 249 250
[6,] 249 250 253 263
mat1
also has 36 rows each for corresponding year in mat
. mat1
contains some doy in four columns for each year.
I want to subset each year from mat
using mat1
. For example, year 1980 in mat
is to be subsetted in three groups:
group 1 from 230 till 231 (1st and second column of row 1 from `mat1`)
group 2 from 232 till 233 (second column + 1 to third column of row 1 from `mat1`)
group 3 from 234 till 236 (third column + 1 to fourth column of row 1 from `mat1`)
This will give me three vectors for 1980. I want to do an rle on each vector to find the longest consecutive occurrence of 1. Something like
group1.rle <- rle(group1)
group2.rle <- rle(group2)
group3.rle <- rle(group3)
max(group1.rle$lengths[group1.rle$values == 1])
max(group2.rle$lengths[group2.rle$values == 1])
max(group3.rle$lengths[group3.rle$values == 1])
and then repeat this for all the years mat
.
Thank you.
This might be a little bit overdone, but I got carried away in the tidyverse (who doesn't know that feeling? ;) )
Packages
# You might as well go with library(tidyverse)
library(dplyr)
library(purrr)
library(tidyr)
Code & Explanation
# Preparation
mat <- as.data.frame(mat)
colnames(mat1) <- c("D1", "D2", "D3", "D4")
mat1 <- cbind(year = 1980:2015, mat1)
mat1 <- as.data.frame(mat1)
I name the columns of mat1
, add the column year
and turn both mat
and mat1
into nice tibble
s so I use left_join
on them. This is important to get the right indicator columns for the right years.
mat_new <- mat %>%
left_join(mat1, by = "year") %>%
mutate(group1 = (doy >= D1 & doy <=D2),
group2 = (doy >= D2 + 1 & doy <=D3),
group3 = (doy >= D3 + 1 & doy <=D4))
mat_new
# A tibble: 13,140 x 10
# year doy val D1 D2 D3 D4 group1 group2 group3
# <int> <int> <int> <int> <int> <int> <int> <lgl> <lgl> <lgl>
# 1 1980 1 1 230 231 233 236 FALSE FALSE FALSE
# 2 1980 2 0 230 231 233 236 FALSE FALSE FALSE
# 3 1980 3 0 230 231 233 236 FALSE FALSE FALSE
# 4 1980 4 0 230 231 233 236 FALSE FALSE FALSE
# 5 1980 5 1 230 231 233 236 FALSE FALSE FALSE
# 6 1980 6 1 230 231 233 236 FALSE FALSE FALSE
# 7 1980 7 1 230 231 233 236 FALSE FALSE FALSE
# 8 1980 8 0 230 231 233 236 FALSE FALSE FALSE
# 9 1980 9 1 230 231 233 236 FALSE FALSE FALSE
# 10 1980 10 1 230 231 233 236 FALSE FALSE FALSE
# ... with 13,130 more rows
So first join the two tibble
and then add group columns whether the doy
is in the subset.
mat_new <- mat_new %>%
gather(group, indicator, group1:group3) %>%
nest(doy, val, indicator)
mat_new
# A tibble: 108 x 7
# year D1 D2 D3 D4 group data
# <int> <int> <int> <int> <int> <chr> <list>
# 1 1980 230 231 233 236 group1 <tibble [365 x 3]>
# 2 1981 235 238 242 242 group1 <tibble [365 x 3]>
# 3 1982 236 242 243 246 group1 <tibble [365 x 3]>
# 4 1983 243 245 247 249 group1 <tibble [365 x 3]>
# 5 1984 247 248 249 250 group1 <tibble [365 x 3]>
# 6 1985 249 250 253 263 group1 <tibble [365 x 3]>
# 7 1986 250 250 255 269 group1 <tibble [365 x 3]>
# 8 1987 255 258 259 269 group1 <tibble [365 x 3]>
# 9 1988 259 259 263 274 group1 <tibble [365 x 3]>
# 10 1989 261 270 273 285 group1 <tibble [365 x 3]>
# ... with 98 more rows
Gather the group columns together and then nest
the data. Now each row contains one year - group combination and the data column stores the columns doy
, val
and indicator
for this particular combination. This makes it easier to calculate the rle
in the next step for all year - group combinations.
mat_new <- mat_new %>%
mutate(group.rle = map(data, ~ .x %>% filter(indicator) %>% pull(val) %>% rle),
max.group.rle = map_dbl(group.rle, ~max(.x$lengths[.x$values == 1])))
mat_new
# A tibble: 108 x 9
# year D1 D2 D3 D4 group data group.rle max.group.rle
# <int> <int> <int> <int> <int> <chr> <list> <list> <dbl>
# 1 1980 230 231 233 236 group1 <tibble [365 x 3]> <S3: rle> 1
# 2 1981 235 238 242 242 group1 <tibble [365 x 3]> <S3: rle> 2
# 3 1982 236 242 243 246 group1 <tibble [365 x 3]> <S3: rle> 1
# 4 1983 243 245 247 249 group1 <tibble [365 x 3]> <S3: rle> 1
# 5 1984 247 248 249 250 group1 <tibble [365 x 3]> <S3: rle> -Inf
# 6 1985 249 250 253 263 group1 <tibble [365 x 3]> <S3: rle> 1
# 7 1986 250 250 255 269 group1 <tibble [365 x 3]> <S3: rle> 1
# 8 1987 255 258 259 269 group1 <tibble [365 x 3]> <S3: rle> 2
# 9 1988 259 259 263 274 group1 <tibble [365 x 3]> <S3: rle> -Inf
# 10 1989 261 270 273 285 group1 <tibble [365 x 3]> <S3: rle> 2
# ... with 98 more rows
With two calls to map
we can get the maximum rle for each group. In the first call, each tibble
in the data
column is filtered by the values stored in the indicator
column, then the val
column is extracted (with pull
) and then finally rle
is applied to these values.
In the second map
call, the rle
stored in the group.rle
column is filtered by your conditions (only lengths of values 1) and the max
is calculated. As this returns a numeric vector of length 1, I use map_dbl
to directly store the result as such.
Note, this call will produce warnings, as not all groups contain the value 1 and thus the maximum has no non-missing arguments after the filtering.
mat_new %>%
select(year, group, max.group.rle) %>%
spread(group, max.group.rle)
# A tibble: 36 x 4
# year group1 group2 group3
# * <int> <dbl> <dbl> <dbl>
# 1 1980 1 -Inf 1
# 2 1981 2 1 -Inf
# 3 1982 1 -Inf -Inf
# 4 1983 1 -Inf 1
# 5 1984 -Inf 1 -Inf
# 6 1985 1 -Inf 2
# 7 1986 1 3 1
# 8 1987 2 -Inf 1
# 9 1988 -Inf -Inf 2
# 10 1989 2 1 3
# # ... with 26 more rows
To have a better look at the results, I only select the columns year
, group
, max.group.rle
and then use spread
, to spread the groups into separate columns. Now we have one line per year with the relevant information.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.