获取一定长度的连续整数的运行并从第一个值中采样

Question

I am trying to create a function that will return the first integer of a subset of a vector such that the values of the subset are discrete, increasing by 1, and of a specified length.我正在尝试创建一个 function ，它将返回向量子集的第一个 integer ，使得子集的值是离散的，增加 1 并且具有指定的长度。

For example, using the input data 'v' and a specified length 'l' of 3:例如，使用输入数据 'v' 和指定长度 'l' 为 3：

v <- c(3, 4, 5, 6, 15, 16, 25, 26, 27)
l <- 3

The possible sub-vectors of consecutive values of length 3 would be:长度为 3 的连续值的可能子向量为：

c(3, 4, 5)
c(4, 5, 6)
c(25, 26, 27)

Then I want to randomly choose one of these vectors and return the first/lowest number, ie 3, 4, or 25.然后我想随机选择其中一个向量并返回第一个/最小的数字，即 3、4 或 25。

Answer 1

Here's an approach with base R :这是使用基础R的方法：

First, we create all possible sub-vectors of length length .首先，我们创建所有可能的长度为length的子向量。 Next, we subset that list of vectors based on the cumsum of their difference equalling 1 .接下来，我们根据等于1的差值的cumsum向量列表进行子集化。 The is.na test ensures the last vectors which contain NA are also filtered out. is.na测试确保最后包含NA的向量也被过滤掉。 Then we just bind the remaining vectors into a matrix and sample the first column.然后我们只需将剩余的向量绑定到一个矩阵中并对第一列进行采样。

SampleSequencialVectors <- function(vec, length){
  all.vecs <- lapply(seq_along(vec),function(x)vec[x:(x+(length-1))])
  seq.vec <- all.vecs[sapply(all.vecs,function(x) all(diff(x) == 1 & !is.na(diff(x))))]
  sample(do.call(rbind,seq.vec)[,1],1)
}

replicate(10, SampleSequencialVectors(v, 3))
# [1]  3  4  3  3  4  4 25 25  3 25

Or if you prefer a tidyverse type approach:或者，如果您更喜欢 tidyverse 类型的方法：

SampleSequencialVectorsPurrr <- function(vec, length){
  vec %>%
    seq_along %>%
    purrr::map(~vec[.x:(.x+(length-1))]) %>%
    purrr::keep(~ all(diff(.x) == 1 & !is.na(diff(.x)))) %>%
    purrr::invoke(rbind,.) %>%
    {sample(.[,1],size = 1)}
}
replicate(10, SampleSequencialVectorsPurrr(v, 3))
 [1]  4 25 25  3 25  4  4  3  4 25

Answer 2

Split the vector into runs of consecutive values*: split(v, cumsum(c(1L, diff(v) != 1)))将向量拆分为连续值的运行*： split(v, cumsum(c(1L, diff(v) != 1)))
Select runs of length above or equal to the limit: runs[lengths(runs) >= lim] Select 运行长度高于或等于限制： runs[lengths(runs) >= lim]
From each run, select the possible first values ( x[1:(length(x) - lim + 1)] ).从每次运行中，select 可能的第一个值（ x[1:(length(x) - lim + 1)] ）。

From all possible first values, sample 1.从所有可能的第一个值中，样本 1。

 runs = split(v, cumsum(c(1L, diff(v),= 1))) first = lapply(runs[lengths(runs) >= lim]: function(x) x[1,(length(x) - lim + 1)]) sample(unlist(first), 1)

Here we loop over runs of sufficient length, and not all individual values (see the other answers), thus it may be faster on larger vectors (haven't tested).在这里，我们循环遍历足够长度的运行，而不是所有单个值（请参阅其他答案），因此它可能在较大的向量上更快（尚未测试）。

Slightly more compact using data.table :使用data.table稍微更紧凑：

 sample(data.table(v)[ , if(.N >= 3) v[1:(length(v) - lim + 1)],
                       by = .(cumsum(c(1L, diff(v) != 1)))]$V1, 1)

*Credits to the nice canonical: How to split a vector into groups of consecutive sequences? *归功于nice canonical：如何将向量拆分为连续序列组？ . .

Answer 3

Base R two lines: Please note this solution assumes v is sorted.基本 R 两行：请注意此解决方案假定 v 已排序。

consec_seq <- sapply(seq_along(v), function(i)split(v, abs(v - v[i]) > 1)[1])
consec_seq[lengths(consec_seq) == l][sample.int(l, 1)]

As a reusable function (not assuming sorted v):作为可重复使用的 function（不假设已排序 v）：

conseq_split_sample <- function(vec, n){ 
  v <- sort(vec)
  consec_seq <- sapply(seq_along(v), function(i)split(v, abs(v - v[i]) > 1)["FALSE"])
  consec_seq[lengths(consec_seq) == n][sample.int(n, 1)]
}
conseq_split_sample(v, l)

Data:数据：

 l <- 3
 v <- c(3, 4, 5, 6, 15, 16, 25, 26, 27)

Answer 4

Tooting my own horn -- cgwtools::seqle is like rle but you can specify the desired increment in a run.吹响我自己的号角cgwtools::seqle rle rle 类似，但您可以在运行中指定所需的增量。 seqle(x, incr = 0,..) is the same as rle(x) seqle(x, incr = 0,..)与rle(x)相同

Then just grab the run lengths and starting values from the result.然后从结果中获取运行长度和起始值。

获取一定长度的连续整数的运行并从第一个值中采样

问题描述

4 个解决方案

解决方案1
3 已采纳 2020-06-12 15:04:13

解决方案2
3 2020-06-12 16:16:17

解决方案3
2 2020-06-12 15:38:12

解决方案4
0 2020-06-12 15:07:27

获取一定长度的连续整数的运行并从第一个值中采样

问题描述

4 个解决方案

解决方案1 3 已采纳 2020-06-12 15:04:13

解决方案2 3 2020-06-12 16:16:17

解决方案3 2 2020-06-12 15:38:12

解决方案4 0 2020-06-12 15:07:27

解决方案1
3 已采纳 2020-06-12 15:04:13

解决方案2
3 2020-06-12 16:16:17

解决方案3
2 2020-06-12 15:38:12

解决方案4
0 2020-06-12 15:07:27