通过分隔符将矢量分割成块

Question

I have the following structure: 我有以下结构：

timestamp = c(1,2,3,4,5,6,7,8,9,10)
values = c(1337,42,NA,23,67,2,NA,NA,NA,5)
df = data.frame(timestamp,values)

#    timestamp values
# 1          1   1337
# 2          2     42
# 3          3     NA
# 4          4     23
# 5          5     67
# 6          6      2
# 7          7     NA
# 8          8     NA
# 9          9     NA
# 10        10      5

Now I want to know, how many coherent chunks there are (in this case 3: [1337,42] [23,67,2] and [5]. Maybe I can even split it to sub data frames or something like this? 现在我想知道，有多少连贯的块（在这种情况下为3：[1337,42] [23,67,2]和[5]。也许我甚至可以将它分成子数据帧或类似的东西？

Answer 1

Here is a way to do this with data.table : 以下是使用data.table执行此data.table ：

library(data.table)
timestamp = c(1,2,3,4,5,6,7,8,9,10)
values = c(1337,42,NA,23,67,2,NA,NA,NA,5)
dt = data.table(timestamp,values)
dt[, previous:=shift(values)]
res <- dt[!is.na(values) & is.na(previous), .N]
res
# [1] 3

As performance was mentioned in the comment, here is the benchmark of solutions (on 1e5 rows): 由于评论中提到了性能，这里是解决方案的基准（在1e5行上）：

    Unit: milliseconds
          expr        min        lq      mean     median        max neval
     dt[shift]   3.966346   4.03936   4.90822   4.728635   7.345617    10
 split[cumsum]  15.693565  17.38094  18.79429  17.739346  31.370630    10
           rle  42.375227  42.65068  45.82327  45.326625  51.473468    10
         dplyr 645.156377 655.90239 676.37797 678.966334 711.393856    10

Answer 2

Using library dplyr , you can do something like this: 使用库dplyr ，你可以这样做：

library(dplyr)
timestamp = c(1,2,3,4,5,6,7,8,9,10)
values = c(1337,42,NA,23,67,2,NA,NA,NA,5)
df = data.frame(timestamp,values)
df %>%
  mutate(id = cumsum(is.na(values) | is.na(lag(values)))) %>%
  filter(!is.na(values)) %>%
  group_by(id) %>%
  summarise(chunks = paste(values, collapse = ',')) %>%
  select(-id)

Output is: 输出是：

Source: local data frame [3 x 1]

   chunks
    <chr>
1 1337,42
2 23,67,2
3       5

Answer 3

You can also use rle and rleid function: 您还可以使用rle和rleid功能：

library(data.table)
values = c(1337,42,NA,23,67,2,NA,NA,NA,5)
split(values, rleid(is.na(values)))[rle(!is.na(values))$values]
$`1`
[1] 1337   42

$`3`
[1] 23 67  2

$`5`
[1] 5

Answer 4

As rawr suggested in the comments i am using the following solution: 正如rawr在评论中建议我使用以下解决方案：

foo <- function( x ){
   idx <- 1 + cumsum( is.na( x ) )
   not.na <- ! is.na( x )
   result <- split( x[not.na], idx[not.na] )
   return(result)
}

Reasons: 原因：

It was the first solution 这是第一个解决方案
It works 有用
I understand it 我明白
It does not use any packages/libraries. 它不使用任何包/库。

Still thanks for all answers! 仍然感谢所有的答案！

I will mark this as answered as soo as i can (in two days). 我会尽可能地回答这个问题（两天之内）。

通过分隔符将矢量分割成块

问题描述

4 个解决方案

解决方案1
5 2016-05-24 23:05:31

解决方案2
3 2016-05-24 22:57:38

解决方案3
3 2016-05-24 23:03:40

解决方案4
3 已采纳 2016-05-24 23:15:34

通过分隔符将矢量分割成块

问题描述

4 个解决方案

解决方案1 5 2016-05-24 23:05:31

解决方案2 3 2016-05-24 22:57:38

解决方案3 3 2016-05-24 23:03:40

解决方案4 3 已采纳 2016-05-24 23:15:34

解决方案1
5 2016-05-24 23:05:31

解决方案2
3 2016-05-24 22:57:38

解决方案3
3 2016-05-24 23:03:40

解决方案4
3 已采纳 2016-05-24 23:15:34