[英]Split vector in R every time vector element changes
I need to split a vector of repeated groups of elements every time the element value changes.每次元素值更改时,我都需要拆分重复元素组的向量。 For example:
例如:
test_vector <- c("string1", "string1", "string1", "string2",
"string2", "string1", "string1", "string3")
must become:必须变成:
$`1`
[1] "string1" "string1" "string1"
$`2`
[1] "string2" "string2"
$`3`
[1] "string1" "string1"
$`4`
[1] "string3"
If I try split(test_vector, test_vector)
I get the wrong output:如果我尝试
split(test_vector, test_vector)
我得到错误的输出:
$string1
[1] "string1" "string1" "string1" "string1" "string1"
$string2
[1] "string2" "string2"
$string3
[1] "string3"
I wrote some code which achieves this but it seems unnecessarily long and I feel like I'm missing something out there that's much simpler:我写了一些代码来实现这一点,但它似乎不必要地长,我觉得我错过了一些更简单的东西:
# find indices where splitting will occur:
split_points <- rep(F, length(test_vector))
for (i in 1:length(test_vector)) {
if (i != 1) {
if (test_vector[i] != test_vector[i-1]) {
split_points[i] <- T
}
}
}
split_points <- c(1, which(split_points))
# create split vector:
split_code <- rep(1, length(test_vector))
for ( j in 1:length(split_points) ) {
if (j!=length(split_points)) {
split_code[
split_points[j]:(split_points[j+1]-1)
] <- j
} else {
split_code[
split_points[j]:length(test_vector)
] <- j
}
}
split_result <- split(test_vector, split_code)
$`1`
[1] "string1" "string1" "string1"
$`2`
[1] "string2" "string2"
$`3`
[1] "string1" "string1"
$`4`
[1] "string3"
If anyone could help me find a simpler solution this would be much appreciated!如果有人能帮我找到更简单的解决方案,我将不胜感激!
In base R
, we can use rle
to get the run-length-encoding of the vector在
base R
,我们可以使用rle
来获得向量的游程编码
grp <- with(rle(test_vector), rep(seq_along(values), lengths))
Use that to split
the vector
用它来
split
vector
split(test_vector, grp)
With data.table
, rleid
gives the id based on the difference between adjacent elements使用
data.table
, rleid
根据相邻元素之间的差异给出 id
library(data.table)
split(test_vector, rleid(test_vector))
f = cumsum(c(TRUE, test_vector[-length(test_vector)] != test_vector[-1]))
split(test_vector, f)
OR或者
with(rle(test_vector), Map(rep, values, lengths))
A base R option is to use findInterval
+ cumsum
+ rle
, ie,甲基R选项是使用
findInterval
+ cumsum
+ rle
,即
res <- split(test_vector,
findInterval(seq_along(test_vector),
cumsum(rle(test_vector)$lengths),
left.open = TRUE))
such that以至于
> res
$`1`
[1] "string1" "string1" "string1"
$`2`
[1] "string2" "string2"
$`3`
[1] "string1" "string1"
$`4`
[1] "string3"
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.