每次向量元素改变时在 R 中分割向量

Question

I need to split a vector of repeated groups of elements every time the element value changes.每次元素值更改时，我都需要拆分重复元素组的向量。 For example:例如：

test_vector <- c("string1", "string1", "string1", "string2", 
  "string2", "string1", "string1", "string3")

must become:必须变成：

$`1`
[1] "string1" "string1" "string1"

$`2`
[1] "string2" "string2"

$`3`
[1] "string1" "string1"

$`4`
[1] "string3"

If I try split(test_vector, test_vector) I get the wrong output:如果我尝试split(test_vector, test_vector)我得到错误的输出：

$string1
[1] "string1" "string1" "string1" "string1" "string1"

$string2
[1] "string2" "string2"

$string3
[1] "string3"

I wrote some code which achieves this but it seems unnecessarily long and I feel like I'm missing something out there that's much simpler:我写了一些代码来实现这一点，但它似乎不必要地长，我觉得我错过了一些更简单的东西：

# find indices where splitting will occur:
split_points <- rep(F, length(test_vector))
for (i in 1:length(test_vector)) {
  if (i != 1) {
    if (test_vector[i] != test_vector[i-1]) {
      split_points[i] <- T
    }
  }
}
split_points <- c(1, which(split_points))

# create split vector:
split_code <- rep(1, length(test_vector))
for ( j in 1:length(split_points) ) {

  if (j!=length(split_points)) {
    split_code[
      split_points[j]:(split_points[j+1]-1)
    ] <- j
  } else {
    split_code[
      split_points[j]:length(test_vector)
    ] <- j
  }

}

split_result <- split(test_vector, split_code)
$`1`
[1] "string1" "string1" "string1"

$`2`
[1] "string2" "string2"

$`3`
[1] "string1" "string1"

$`4`
[1] "string3"

If anyone could help me find a simpler solution this would be much appreciated!如果有人能帮我找到更简单的解决方案，我将不胜感激！

Answer 1

In base R , we can use rle to get the run-length-encoding of the vector在base R ，我们可以使用rle来获得向量的游程编码

grp <- with(rle(test_vector), rep(seq_along(values), lengths))

Use that to split the vector用它来split vector

split(test_vector, grp)

With data.table , rleid gives the id based on the difference between adjacent elements使用data.table ， rleid根据相邻元素之间的差异给出 id

library(data.table)
split(test_vector, rleid(test_vector))

Answer 2

f = cumsum(c(TRUE, test_vector[-length(test_vector)] != test_vector[-1]))
split(test_vector, f)

OR或者

with(rle(test_vector), Map(rep, values, lengths))

Answer 3

A base R option is to use findInterval + cumsum + rle , ie,甲基R选项是使用findInterval + cumsum + rle ，即

res <- split(test_vector,
             findInterval(seq_along(test_vector),
                          cumsum(rle(test_vector)$lengths),
                          left.open = TRUE))

such that以至于

> res
$`1`
[1] "string1" "string1" "string1"

$`2`
[1] "string2" "string2"

$`3`
[1] "string1" "string1"

$`4`
[1] "string3"

每次向量元素改变时在 R 中分割向量

问题描述

3 个解决方案

解决方案1
3 已采纳 2020-03-09 21:49:19

解决方案2
1 2020-03-09 21:50:16

解决方案3
1 2020-03-09 22:06:49

每次向量元素改变时在 R 中分割向量

问题描述

3 个解决方案

解决方案1 3 已采纳 2020-03-09 21:49:19

解决方案2 1 2020-03-09 21:50:16

解决方案3 1 2020-03-09 22:06:49

解决方案1
3 已采纳 2020-03-09 21:49:19

解决方案2
1 2020-03-09 21:50:16

解决方案3
1 2020-03-09 22:06:49