[英]split vector or data.frame into intervals by condition and print interval's first and last value
I have data.frame which looks like this: 我有如下所示的data.frame:
v1 <- c(1:10)
v2 <- c(FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE)
dfb <- data.frame(v1, v2)
> dfb
v1 v2
1 1 FALSE
2 2 FALSE
3 3 TRUE
4 4 FALSE
5 5 FALSE
6 6 FALSE
7 7 TRUE
8 8 FALSE
9 9 FALSE
10 10 FALSE
I need those operations: 我需要这些操作:
V2
if is TRUE
TRUE
则根据V2
将data.frame分成间隔 V2
is TRUE
will be last interval element V2
为TRUE
行将是最后一个间隔元素 TRUE
it will be treated as if is (this can be easily achieved by adding TRUE
to last vector position) TRUE
,则将其视为TRUE
(这可以通过在最后一个矢量位置加上TRUE
轻松实现) V1
as first and last element from created intervals V1
打印为第一个和最后一个元素 after this operations my results should look like this: 进行此操作后,我的结果应如下所示:
> df_final
Vx Vy
1 3
4 7
8 10
I've tried cumsum
on v2
vector but TRUE
values are treated as first interval element not last 我已经在
v2
向量上尝试了cumsum
,但是TRUE
值被视为第一个间隔元素而不是最后一个
> split(v2, cumsum(v2==TRUE))
$`0`
[1] FALSE FALSE
$`1`
[1] TRUE FALSE FALSE FALSE
$`2`
[1] TRUE FALSE FALSE FALSE
Get df_final 获取df_final
Vy <- c(which(dfb$v2 %in% T),nrow(dfb))
Vx <- c(1,Vy[-length(Vy)]+1)
df_final <- data.frame(Vx,Vy)
Split Df 分割Df
library(data.table)
split_ind <- rleid(dfb$v2)-!(rleid(dfb$v2) %% 2)
split(dfb,split_ind)
You can still use cumsum
, you just have to slightly adjust v2
: 您仍然可以使用
cumsum
,只需稍微调整v2
:
v3 <- c(TRUE,v2[-length(v2)])
v3
[1] TRUE FALSE FALSE TRUE FALSE FALSE FALSE TRUE FALSE FALSE
res <- split(v2,cumsum(v3))
res[[length(res)]][length(last(res))] <- T
res
$`1`
[1] FALSE FALSE TRUE
$`2`
[1] FALSE FALSE FALSE TRUE
$`3`
[1] FALSE FALSE TRUE
df_final <- data.frame(Vx=which(v3),Vy=which(unlist(res,use.names=F)))
df_final
Vx Vy
1 1 3
2 4 7
3 8 10
I will also post my answer heavily inspired by Eldioo, this one is useful also when V1
are non numeric values and avoids using split
and cumsum
functions. 我还将在很大程度上受到Eldioo启发的地方发布我的答案,当
V1
是非数字值并且避免使用split
和cumsum
函数时,此答案也很有用。
Input: 输入:
v1 <- letters[1:10]
v2 <- c(FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE)
dfb <- data.frame(v1, v2)
> dfb
v1 v2
1 a FALSE
2 b FALSE
3 c TRUE
4 d FALSE
5 e FALSE
6 f FALSE
7 g TRUE
8 h FALSE
9 i FALSE
10 j FALSE
Solution: 解:
# data wrangling
library(data.table)
dfb["v3"] <- c(TRUE,dfb$v2[-length(dfb$v2)])
dfb["v4"] <- dfb$v2
dfb$v4[length(dfb$v4)] <- T
Vx <- which(dfb$v3)
Vy <- which(dfb$v4)
Vx <- dfb[Vx, ]$v1
Vy <- dfb[Vy, ]$v1
# for debugging purposes
dfb
v1 v2 v3 v4
1 a FALSE TRUE FALSE
2 b FALSE FALSE FALSE
3 c TRUE FALSE TRUE
4 d FALSE TRUE FALSE
5 e FALSE FALSE FALSE
6 f FALSE FALSE FALSE
7 g TRUE FALSE TRUE
8 h FALSE TRUE FALSE
9 i FALSE FALSE FALSE
10 j FALSE FALSE TRUE
# final results
data.frame(Vx, Vy)
Vx Vy
1 a c
2 d g
3 h j
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.