R 中有没有办法在定义的连续行上创建 ifelse？

Question

If I have:如果我有：

df<-data.frame(group=c(1, 1,1, 1,1, 2, 2, 2, 4,4,4,4), 
              value=c("A","B","C","B","A","A","A","B","D","A","A","B"))

I want to make an ifelse statement or equivalent for whether any "3 in a row" starting from the first row within a group has certain values.我想对从组内第一行开始的任何“连续 3”是否具有某些值进行 ifelse 语句或等效语句。 So for example, starting in group 1 I want to scan ABC, then BCB, then CBA, and maybe making a 'want' column of if 'C' shows in every scan or not.例如，从第 1 组开始，我想扫描 ABC，然后是 BCB，然后是 CBA，并且可能会在每次扫描中是否显示“C”时创建一个“想要”列。 Something like this:像这样的东西：


  group value want_any_c want_any_b
1      1     A        yes        yes
2      1     B        yes        yes
3      1     C        yes        yes
4      1     B        yes        yes
5      1     A        yes        yes
6      2     A         no        yes
7      2     A         no        yes
8      2     B         no        yes
9      4     D         no        yes
10     4     A         no        yes
11     4     A         no        yes
12     4     B         no        yes

follow up: I want to also see if EVERY scan of 3 contains a value, starting from the first row in a group then the second group etc. (ie group 1 scan ABC, BCB, CBA, group 2 scan AAB, and group 4 scan DAA, AAB.) (ty akrun):跟进：我还想看看每个扫描 3 是否包含一个值，从组中的第一行开始，然后是第二组等（即第 1 组扫描 ABC、BCB、CBA，第 2 组扫描 AAB 和第 4 组扫描 DAA、AAB。）（ty akrun）：

  group value want_any_c want_any_b want_every_c want_every_b
1      1     A        yes        yes          yes          yes
2      1     B        yes        yes          yes          yes
3      1     C        yes        yes          yes          yes
4      1     B        yes        yes          yes          yes
5      1     A        yes        yes          yes          yes
6      2     A         no        yes           no          yes
7      2     A         no        yes           no          yes
8      2     B         no        yes           no          yes
9      4     D         no        yes           no           no
10     4     A         no        yes           no           no
11     4     A         no        yes           no           no
12     4     B         no        yes           no           no

Answer 1

We can use any or %in%我们可以使用any或%in%

library(dplyr)
df %>% 
   group_by(group) %>%
   mutate(want_any_c = c('no', 'yes')[('C' %in% value) + 1],
           want_any_b = c('no', 'yes')[('B' %in% value) + 1])
# A tibble: 12 x 4
# Groups:   group [3]
#   group value want_any_c want_any_b
#   <dbl> <fct> <chr>      <chr>     
# 1     1 A     yes        yes       
# 2     1 B     yes        yes       
# 3     1 C     yes        yes       
# 4     1 B     yes        yes       
# 5     1 A     yes        yes       
# 6     2 A     no         yes       
# 7     2 A     no         yes       
# 8     2 B     no         yes       
# 9     4 D     no         yes       
#10     4 A     no         yes       
#11     4 A     no         yes       
#12     4 B     no         yes

If it is every scan of 3 values, create another group with gl如果是每次扫描 3 个值，则使用gl创建另一个组

library(zoo)
df %>%
 group_by(group) %>%
  mutate(want_any_c = c('no', 'yes')[('C' %in% value) + 1],
        want_any_b = c('no', 'yes')[('B' %in% value) + 1],
        want_every_c = c('no', 'yes')[(all(rollapply(value, 3,
             FUN = function(x) 'C' %in% x))) + 1],
        want_every_b = c('no', 'yes')[(all(rollapply(value, 3, 
             FUN = function(x) 'B' %in% x))) + 1])
# A tibble: 12 x 6
# Groups:   group [3]
#   group value want_any_c want_any_b want_every_c want_every_b
#   <dbl> <fct> <chr>      <chr>      <chr>        <chr>       
# 1     1 A     yes        yes        yes          yes         
# 2     1 B     yes        yes        yes          yes         
# 3     1 C     yes        yes        yes          yes         
# 4     1 B     yes        yes        yes          yes         
# 5     1 A     yes        yes        yes          yes         
# 6     2 A     no         yes        no           yes         
# 7     2 A     no         yes        no           yes         
# 8     2 B     no         yes        no           yes         
# 9     4 D     no         yes        no           no          
#10     4 A     no         yes        no           no          
#11     4 A     no         yes        no           no          
#12     4 B     no         yes        no           no

As it is done on multiple values, a function would be more useful因为它是在多个值上完成的，所以函数会更有用

f1 <- function(colNm, val){
          c('no', 'yes')[(val %in% {{colNm}}) + 1]
 }


f2 <- function(colNm, val){
        c('no', 'yes')[(all(rollapply({{colNm}}, 3, 
             FUN = function(x) val %in% x))) + 1]
 }

df %>%
    group_by(group) %>%
    mutate(want_any_c = f1(value, "C"), 
           want_any_b = f1(value, "B"),
           want_every_c = f2(value, "C"),
           want_every_b = f2(value, "B"))

Answer 2

Here's a data.table solution这是一个 data.table 解决方案

library(zoo)
library(data.table)
setDT(df)

to_check <- c('C', 'B')

df[, paste0('want_any_', to_check) := lapply(to_check, '%in%', value),
   by = group]


df[, paste0('want_every_', to_check) := 
      lapply(to_check, function(x) all(rollapply(value, 3, '%in%', x = x))),
   by = group]

df
#     group value want_any_C want_any_B want_every_C want_every_B
#  1:     1     A       TRUE       TRUE         TRUE         TRUE
#  2:     1     B       TRUE       TRUE         TRUE         TRUE
#  3:     1     C       TRUE       TRUE         TRUE         TRUE
#  4:     1     B       TRUE       TRUE         TRUE         TRUE
#  5:     1     A       TRUE       TRUE         TRUE         TRUE
#  6:     2     A      FALSE       TRUE        FALSE         TRUE
#  7:     2     A      FALSE       TRUE        FALSE         TRUE
#  8:     2     B      FALSE       TRUE        FALSE         TRUE
#  9:     4     D      FALSE       TRUE        FALSE        FALSE
# 10:     4     A      FALSE       TRUE        FALSE        FALSE
# 11:     4     A      FALSE       TRUE        FALSE        FALSE
# 12:     4     B      FALSE       TRUE        FALSE        FALSE

Or as yes/no或作为是/否

want_cols <- grep('want', names(df), value = T)

df[,  (want_cols) := lapply(mget(want_cols), ifelse, 'yes', 'no')]

df
#     group value want_any_C want_any_B want_every_C want_every_B
#  1:     1     A        yes        yes          yes          yes
#  2:     1     B        yes        yes          yes          yes
#  3:     1     C        yes        yes          yes          yes
#  4:     1     B        yes        yes          yes          yes
#  5:     1     A        yes        yes          yes          yes
#  6:     2     A         no        yes           no          yes
#  7:     2     A         no        yes           no          yes
#  8:     2     B         no        yes           no          yes
#  9:     4     D         no        yes           no           no
# 10:     4     A         no        yes           no           no
# 11:     4     A         no        yes           no           no
# 12:     4     B         no        yes           no           no

If you have millions of rows the rollapply approach might be slow.如果您有数百万行，rollapply 方法可能会很慢。 I don't think it's necessarry, there's probably a solution in checking diff(which(value == 'C')) (which I can't figure out at the moment).我不认为这是必要的，检查diff(which(value == 'C'))可能有一个解决方案（我目前无法弄清楚）。

Answer 3

Here is a base R solution, where you first define function want as below这是一个基本的 R 解决方案，您首先在其中定义want函数，如下所示

want <- function(v,key,f) {
    u <- sapply(seq(length(v)-2),function(k) key %in% v[k+0:2])
    switch (f,
            "any" = rep(ifelse(any(u),"Yes","No"),length(v)),
            "every" = rep(ifelse(all(u),"Yes","No"),length(v))
    )
}

and then you will get the desired output through the following code:然后您将通过以下代码获得所需的输出：

dfout <- cbind(df,do.call(rbind, c(make.row.names = F,
                                   lapply(split(df,df$group), function(v) data.frame(
                                       want_any_c = want(v$value,"C","any"),
                                       want_any_b = want(v$value,"B","any"),
                                       want_every_c = want(v$value,"C","every"),
                                       want_every_b = want(v$value,"B","every"))))))

such that以至于

> dfout
   group value want_any_c want_any_b want_every_c want_every_b
1      1     A        Yes        Yes          Yes          Yes
2      1     B        Yes        Yes          Yes          Yes
3      1     C        Yes        Yes          Yes          Yes
4      1     B        Yes        Yes          Yes          Yes
5      1     A        Yes        Yes          Yes          Yes
6      2     A         No        Yes           No          Yes
7      2     A         No        Yes           No          Yes
8      2     B         No        Yes           No          Yes
9      4     D         No        Yes           No           No
10     4     A         No        Yes           No           No
11     4     A         No        Yes           No           No
12     4     B         No        Yes           No           No

Answer 4

Base R, but doesn't require hardcoding indvidual values as vectors, and matching them etc:基础 R，但不需要将单个值硬编码为向量，并匹配它们等：

    # Create a group of each grouping var every three rows: 
    n = 3

    df$group2 <- paste0(df$group, 

                       " - ",

                       ave(rep(1:n, ceiling(nrow(df)/n)),

                           rep(1:n, ceiling(nrow(df)/n)), 

                           FUN = seq.int)[1:nrow(df)])


    # Row-wise concatenate the unique values per group: 

    values_by_group <- aggregate(value~group2, df, FUN =

                                   function(x){

                                     paste0(unique(sort(x)),

                                            collapse = ", ")})

 # Add a vector per each unique value in df's value vector: 

values_by_group <- cbind(values_by_group, 

      setNames(data.frame(matrix(NA, nrow = nrow(values_by_group), 

                        ncol = length(unique(df$value)))), 

               c(unique(sapply(df$value, as.character)))))

# Store a vector of indices of values_by_group table 
# matching the values in the original dataframe: 

vec_idx <- names(values_by_group) %in% unique(sapply(df$value, as.character))

# Match vector names with values in value vector:

values_by_group[,vec_idx] <- 

  t(vapply(strsplit(as.character(values_by_group$value), ', '),

         function(x){

           names(values_by_group)[c(vec_idx)] %in% x

           },

          logical(ncol(values_by_group)-sum(!(vec_idx)))

         )
      )


# Merge with the original dataframe, drop unwanted grouping vec:

final_df <- within(merge(df, 

                  values_by_group[,names(values_by_group) != "value"], 

                  by = "group2",

                  all.x = TRUE), rm("group2"))

Data:数据：

df <- data.frame(group = c(1, 1,1, 1,1, 2, 2, 2, 4,4,4,4),
                value = c("A","B","C","B","A","A","A","B","D","A","A","B"))

R 中有没有办法在定义的连续行上创建 ifelse？

问题描述

4 个解决方案

解决方案1
3 已采纳 2020-01-03 20:21:52

解决方案2
2 2020-01-03 21:04:05

解决方案3
1 2020-01-03 21:35:11

解决方案4
0 2020-01-06 10:25:11

R 中有没有办法在定义的连续行上创建 ifelse？

问题描述

4 个解决方案

解决方案1 3 已采纳 2020-01-03 20:21:52

解决方案2 2 2020-01-03 21:04:05

解决方案3 1 2020-01-03 21:35:11

解决方案4 0 2020-01-06 10:25:11

解决方案1
3 已采纳 2020-01-03 20:21:52

解决方案2
2 2020-01-03 21:04:05

解决方案3
1 2020-01-03 21:35:11

解决方案4
0 2020-01-06 10:25:11