简体   繁体   中英

split vector or data.frame into intervals by condition and print interval's first and last value

I have data.frame which looks like this:

v1 <- c(1:10)
v2 <- c(FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE)
dfb <- data.frame(v1, v2)

> dfb
   v1    v2
1   1 FALSE
2   2 FALSE
3   3  TRUE
4   4 FALSE
5   5 FALSE
6   6 FALSE
7   7  TRUE
8   8 FALSE
9   9 FALSE
10 10 FALSE

I need those operations:

  1. split data.frame into intervals according to V2 if is TRUE
  2. rows where V2 is TRUE will be last interval element
  3. if the last element is not TRUE it will be treated as if is (this can be easily achieved by adding TRUE to last vector position)
  4. print V1 as first and last element from created intervals

after this operations my results should look like this:

  > df_final
   Vx Vy
    1 3
    4 7
    8 10

I've tried cumsum on v2 vector but TRUE values are treated as first interval element not last

> split(v2, cumsum(v2==TRUE))
$`0`
[1] FALSE FALSE

$`1`
[1]  TRUE FALSE FALSE FALSE

$`2`
[1]  TRUE FALSE FALSE FALSE

Get df_final

Vy <- c(which(dfb$v2 %in% T),nrow(dfb))
Vx <- c(1,Vy[-length(Vy)]+1)

df_final <- data.frame(Vx,Vy)

Split Df

library(data.table)

split_ind <- rleid(dfb$v2)-!(rleid(dfb$v2) %% 2)

split(dfb,split_ind)

You can still use cumsum , you just have to slightly adjust v2 :

v3 <- c(TRUE,v2[-length(v2)])
v3
 [1]  TRUE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE

res <- split(v2,cumsum(v3))
res[[length(res)]][length(last(res))] <- T
res
$`1`
[1] FALSE FALSE  TRUE

$`2`
[1] FALSE FALSE FALSE  TRUE

$`3`
[1] FALSE FALSE  TRUE

df_final <- data.frame(Vx=which(v3),Vy=which(unlist(res,use.names=F)))
df_final
  Vx Vy
1  1  3
2  4  7
3  8 10

I will also post my answer heavily inspired by Eldioo, this one is useful also when V1 are non numeric values and avoids using split and cumsum functions.

Input:

v1 <- letters[1:10]
v2 <- c(FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE)
dfb <- data.frame(v1, v2)

> dfb
   v1    v2
1   a FALSE
2   b FALSE
3   c  TRUE
4   d FALSE
5   e FALSE
6   f FALSE
7   g  TRUE
8   h FALSE
9   i FALSE
10  j FALSE

Solution:

# data wrangling
library(data.table)
dfb["v3"] <- c(TRUE,dfb$v2[-length(dfb$v2)])
dfb["v4"] <- dfb$v2
dfb$v4[length(dfb$v4)] <- T
Vx <- which(dfb$v3)
Vy <- which(dfb$v4)
Vx <- dfb[Vx, ]$v1
Vy <- dfb[Vy, ]$v1

# for debugging purposes
dfb
   v1    v2    v3    v4
1   a FALSE  TRUE FALSE
2   b FALSE FALSE FALSE
3   c  TRUE FALSE  TRUE
4   d FALSE  TRUE FALSE
5   e FALSE FALSE FALSE
6   f FALSE FALSE FALSE
7   g  TRUE FALSE  TRUE
8   h FALSE  TRUE FALSE
9   i FALSE FALSE FALSE
10  j FALSE FALSE  TRUE

# final results
data.frame(Vx, Vy)
  Vx Vy
1  a  c
2  d  g
3  h  j

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM