简体   繁体   English

识别一系列值之前和之后的某个值

[英]identify series of values preceded and followed by a certain value

I have a sequence of integers. 我有一系列整数。 What I would like to do is identify all sequences of 3's that is preceded AND followed by a 5. For example: 我想要做的是识别前面的3个序列,然后是5后面的序列。例如:

c(5,3,3,5,5,4,3,3,5)

The desired output would be: 期望的输出是:

c(F,T,T,F,F,F,F,F,F)

Explanation: The first sequence of 3's is preceded and followed by a 5. Hence True . 说明:3的第一个序列之前和之后是5.因此为True The second sequence is preceded by a 4, hence False . 第二个序列前面是4,因此是False

Couldn't come up with a smarter solution so here is a for loop 无法提出更智能的解决方案,所以这里是一个for循环

x <- c(5,3,3,5,5,4,3,3,5) #Initial vector
current_inds <- numeric() #Variable to hold indices which need to be changed
saw_3 <- FALSE  #If 3 was seen before
output <- rep(FALSE, length(x))  #output vector
num_to_check <- 5   #Value to compare
last_val <- 0 #Last non-3 value

for (i in seq_along(x)) {
    #If the current value is not equal to 3
    if (x[i] != 3 ) {
      #Check if we previously saw 3 and if previous non_3 value was 5
      # and the next value is 5
      if(saw_3 & x[i + 1] == num_to_check & last_val == num_to_check) {
         #Change the group of 3 indices to TRUE
         output[current_inds] <- TRUE
         #Make the saw_3 flag as FALSE
         saw_3 <- FALSE
       }
      #Update the last seen non_3 value to current value
      last_val = x[i]
      }
    else {
     #If it is a 3 then append the indices in current_inds
     current_inds <- c(current_inds, i)
     #Make saw_3 flag TRUE
     saw_3 = TRUE
    }
}

output
#[1] FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE

I have a very lengthy & ugly solution, but it works :p I hope someone can find a cleaner one :) I first create a matrix that contains 1 column which is every number in a non-repeated way (not unique, but without consecutives), and then 1 column with the number of times this number is repeated. 我有一个非常冗长和丑陋的解决方案,但它的工作原理:p我希望有人可以找到一个更清洁的:)我首先创建一个包含1列的矩阵,这是一个非重复的方式的每个数字(不是唯一的,但没有连续的),然后1列,重复该数字的次数。 Then I apply a logical function to see if a 3 is surrounded by 5s and in a final step, I unravel the vector back to its original length using the rep() function... 然后我应用逻辑函数来查看3是否被5s包围,在最后一步中,我使用rep()函数将向量解开回原始长度...

x <- c(5,3,3,5,5,4,3,3,5)

x_reduced <- x[x!=c(x[-1], FALSE)]
x_mat <- matrix(0, ncol = 3, nrow = length(x_reduced))
x_mat[ , 1] <- x_reduced

ctr = 1
x_ctr = 1
while (ctr < length(x)) {
  x_mat[x_ctr ,1] = x[ctr]
  x_mat[x_ctr, 2] = x_mat[x_ctr, 2] + 1 
  if(x[ctr+1] == x[ctr]){
    ctr = ctr + 1
  } else {
    x_ctr = x_ctr + 1
    ctr = ctr + 1
  }
}
x_mat[nrow(x_mat), 1] <- x[length(x)]
x_mat[nrow(x_mat), 2] <- x_mat[nrow(x_mat), 2] + 1

check_element <- function(pos) {
  if(pos == 1 | pos == nrow(x_mat)) return(FALSE)
  if(x_mat[pos+1, 1] == 5 & x_mat[pos-1, 1] == 5){
    return(TRUE)
  } else {
    return(FALSE)
  }
}

x_mat[,3] <- sapply(1:nrow(x_mat), check_element)
rep(x_mat[,3], x_mat[,2])

There's room for optimization, but it's certainly possible with dplyr and rle() . 有优化的空间,但dplyrdplyr rle()肯定是可能的。

> df_result
# A tibble: 9 x 1
  result
  <lgl> 
1 FALSE 
2 TRUE  
3 TRUE  
4 FALSE 
5 FALSE 
6 FALSE 
7 FALSE 
8 FALSE 
9 FALSE 

Code

df_result <- df %>%
    group_by(seq = {seq = rle(value); rep(seq_along(seq$lengths), seq$lengths)}) %>%
    ungroup() %>%
    mutate(last_3 = case_when(lag(seq) != seq ~ as.numeric(lag(value) == 5),
                              TRUE ~ NA_real_),
           next_5 = case_when(lead(seq) != seq ~ as.numeric(lead(value) == 5),
                              TRUE ~ NA_real_)) %>%
    group_by(seq, value) %>%
    mutate(result = case_when(sum(last_3, na.rm = TRUE) + sum(next_5, na.rm = TRUE) == 2 ~ TRUE,
                              TRUE ~ FALSE)) %>%
    ungroup() %>%
    select(result)

Data 数据

library(dplyr)
df <- data.frame(value = c(5,3,3,5,5,4,3,3,5))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如果在R中跟随特定值,则删除这些值 - delete values if followed by certain value in R R:替换非NA值之前和之后的NA - R: replace NA that is preceded and followed by non-na values 识别时间序列(字符)值的变化并标记相应数据集中新值的位置 - Identify changes in time series (character) values and flag location of new value in corresponding dataset 计算 r 中的值的数量,后跟另一个值 - Count number of values followed by another value in r 从 R 中的数据系列中识别并消除无关/噪声值 - Identify and eliminate extraneous/noise values from a data series in R R正则表达式删除了apostroph,除了之前和之后的字母 - R regex remove apostroph except the ones preceded and followed by letter 查找严格不以某些字符开头的分数 - Find fractions strictly not preceded by certain characters 检查一系列列中的值是否在与其他一系列列中的值相距一定数量的值内 - check if values in a series of columns are within a certain number of values from those in another series of columns 移动时间序列(将所有值降到某个阈值以下) - move time series (drop all values below a certain threshold) 用于匹配短语后跟特定数字后跟特定字符串的正则表达式 - Regex for matching a phrase followed by certain numbers followed by certain string
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM