简体   繁体   English

根据条件将向量或data.frame拆分为间隔,并打印间隔的第一个和最后一个值

[英]split vector or data.frame into intervals by condition and print interval's first and last value

I have data.frame which looks like this: 我有如下所示的data.frame:

v1 <- c(1:10)
v2 <- c(FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE)
dfb <- data.frame(v1, v2)

> dfb
   v1    v2
1   1 FALSE
2   2 FALSE
3   3  TRUE
4   4 FALSE
5   5 FALSE
6   6 FALSE
7   7  TRUE
8   8 FALSE
9   9 FALSE
10 10 FALSE

I need those operations: 我需要这些操作:

  1. split data.frame into intervals according to V2 if is TRUE 如果为TRUE则根据V2将data.frame分成间隔
  2. rows where V2 is TRUE will be last interval element V2TRUE行将是最后一个间隔元素
  3. if the last element is not TRUE it will be treated as if is (this can be easily achieved by adding TRUE to last vector position) 如果最后一个元素不为TRUE ,则将其视为TRUE (这可以通过在最后一个矢量位置加上TRUE轻松实现)
  4. print V1 as first and last element from created intervals 从创建的间隔中将V1打印为第一个和最后一个元素

after this operations my results should look like this: 进行此操作后,我的结果应如下所示:

  > df_final
   Vx Vy
    1 3
    4 7
    8 10

I've tried cumsum on v2 vector but TRUE values are treated as first interval element not last 我已经在v2向量上尝试了cumsum ,但是TRUE值被视为第一个间隔元素而不是最后一个

> split(v2, cumsum(v2==TRUE))
$`0`
[1] FALSE FALSE

$`1`
[1]  TRUE FALSE FALSE FALSE

$`2`
[1]  TRUE FALSE FALSE FALSE

Get df_final 获取df_final

Vy <- c(which(dfb$v2 %in% T),nrow(dfb))
Vx <- c(1,Vy[-length(Vy)]+1)

df_final <- data.frame(Vx,Vy)

Split Df 分割Df

library(data.table)

split_ind <- rleid(dfb$v2)-!(rleid(dfb$v2) %% 2)

split(dfb,split_ind)

You can still use cumsum , you just have to slightly adjust v2 : 您仍然可以使用cumsum ,只需稍微调整v2

v3 <- c(TRUE,v2[-length(v2)])
v3
 [1]  TRUE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE

res <- split(v2,cumsum(v3))
res[[length(res)]][length(last(res))] <- T
res
$`1`
[1] FALSE FALSE  TRUE

$`2`
[1] FALSE FALSE FALSE  TRUE

$`3`
[1] FALSE FALSE  TRUE

df_final <- data.frame(Vx=which(v3),Vy=which(unlist(res,use.names=F)))
df_final
  Vx Vy
1  1  3
2  4  7
3  8 10

I will also post my answer heavily inspired by Eldioo, this one is useful also when V1 are non numeric values and avoids using split and cumsum functions. 我还将在很大程度上受到Eldioo启发的地方发布我的答案,当V1是非数字值并且避免使用splitcumsum函数时,此答案也很有用。

Input: 输入:

v1 <- letters[1:10]
v2 <- c(FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE)
dfb <- data.frame(v1, v2)

> dfb
   v1    v2
1   a FALSE
2   b FALSE
3   c  TRUE
4   d FALSE
5   e FALSE
6   f FALSE
7   g  TRUE
8   h FALSE
9   i FALSE
10  j FALSE

Solution: 解:

# data wrangling
library(data.table)
dfb["v3"] <- c(TRUE,dfb$v2[-length(dfb$v2)])
dfb["v4"] <- dfb$v2
dfb$v4[length(dfb$v4)] <- T
Vx <- which(dfb$v3)
Vy <- which(dfb$v4)
Vx <- dfb[Vx, ]$v1
Vy <- dfb[Vy, ]$v1

# for debugging purposes
dfb
   v1    v2    v3    v4
1   a FALSE  TRUE FALSE
2   b FALSE FALSE FALSE
3   c  TRUE FALSE  TRUE
4   d FALSE  TRUE FALSE
5   e FALSE FALSE FALSE
6   f FALSE FALSE FALSE
7   g  TRUE FALSE  TRUE
8   h FALSE  TRUE FALSE
9   i FALSE FALSE FALSE
10  j FALSE FALSE  TRUE

# final results
data.frame(Vx, Vy)
  Vx Vy
1  a  c
2  d  g
3  h  j

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM