[英]Is there a way in R to make an ifelse on defined consecutive rows?
If I have:如果我有:
df<-data.frame(group=c(1, 1,1, 1,1, 2, 2, 2, 4,4,4,4),
value=c("A","B","C","B","A","A","A","B","D","A","A","B"))
I want to make an ifelse statement or equivalent for whether any "3 in a row" starting from the first row within a group has certain values.我想对从组内第一行开始的任何“连续 3”是否具有某些值进行 ifelse 语句或等效语句。 So for example, starting in group 1 I want to scan ABC, then BCB, then CBA, and maybe making a 'want' column of if 'C' shows in every scan or not.
例如,从第 1 组开始,我想扫描 ABC,然后是 BCB,然后是 CBA,并且可能会在每次扫描中是否显示“C”时创建一个“想要”列。 Something like this:
像这样的东西:
group value want_any_c want_any_b
1 1 A yes yes
2 1 B yes yes
3 1 C yes yes
4 1 B yes yes
5 1 A yes yes
6 2 A no yes
7 2 A no yes
8 2 B no yes
9 4 D no yes
10 4 A no yes
11 4 A no yes
12 4 B no yes
follow up: I want to also see if EVERY scan of 3 contains a value, starting from the first row in a group then the second group etc. (ie group 1 scan ABC, BCB, CBA, group 2 scan AAB, and group 4 scan DAA, AAB.) (ty akrun):跟进:我还想看看每个扫描 3 是否包含一个值,从组中的第一行开始,然后是第二组等(即第 1 组扫描 ABC、BCB、CBA,第 2 组扫描 AAB 和第 4 组扫描 DAA、AAB。)(ty akrun):
group value want_any_c want_any_b want_every_c want_every_b
1 1 A yes yes yes yes
2 1 B yes yes yes yes
3 1 C yes yes yes yes
4 1 B yes yes yes yes
5 1 A yes yes yes yes
6 2 A no yes no yes
7 2 A no yes no yes
8 2 B no yes no yes
9 4 D no yes no no
10 4 A no yes no no
11 4 A no yes no no
12 4 B no yes no no
We can use any
or %in%
我们可以使用
any
或%in%
library(dplyr)
df %>%
group_by(group) %>%
mutate(want_any_c = c('no', 'yes')[('C' %in% value) + 1],
want_any_b = c('no', 'yes')[('B' %in% value) + 1])
# A tibble: 12 x 4
# Groups: group [3]
# group value want_any_c want_any_b
# <dbl> <fct> <chr> <chr>
# 1 1 A yes yes
# 2 1 B yes yes
# 3 1 C yes yes
# 4 1 B yes yes
# 5 1 A yes yes
# 6 2 A no yes
# 7 2 A no yes
# 8 2 B no yes
# 9 4 D no yes
#10 4 A no yes
#11 4 A no yes
#12 4 B no yes
If it is every scan of 3 values, create another group with gl
如果是每次扫描 3 个值,则使用
gl
创建另一个组
library(zoo)
df %>%
group_by(group) %>%
mutate(want_any_c = c('no', 'yes')[('C' %in% value) + 1],
want_any_b = c('no', 'yes')[('B' %in% value) + 1],
want_every_c = c('no', 'yes')[(all(rollapply(value, 3,
FUN = function(x) 'C' %in% x))) + 1],
want_every_b = c('no', 'yes')[(all(rollapply(value, 3,
FUN = function(x) 'B' %in% x))) + 1])
# A tibble: 12 x 6
# Groups: group [3]
# group value want_any_c want_any_b want_every_c want_every_b
# <dbl> <fct> <chr> <chr> <chr> <chr>
# 1 1 A yes yes yes yes
# 2 1 B yes yes yes yes
# 3 1 C yes yes yes yes
# 4 1 B yes yes yes yes
# 5 1 A yes yes yes yes
# 6 2 A no yes no yes
# 7 2 A no yes no yes
# 8 2 B no yes no yes
# 9 4 D no yes no no
#10 4 A no yes no no
#11 4 A no yes no no
#12 4 B no yes no no
As it is done on multiple values, a function would be more useful因为它是在多个值上完成的,所以函数会更有用
f1 <- function(colNm, val){
c('no', 'yes')[(val %in% {{colNm}}) + 1]
}
f2 <- function(colNm, val){
c('no', 'yes')[(all(rollapply({{colNm}}, 3,
FUN = function(x) val %in% x))) + 1]
}
df %>%
group_by(group) %>%
mutate(want_any_c = f1(value, "C"),
want_any_b = f1(value, "B"),
want_every_c = f2(value, "C"),
want_every_b = f2(value, "B"))
Here's a data.table solution这是一个 data.table 解决方案
library(zoo)
library(data.table)
setDT(df)
to_check <- c('C', 'B')
df[, paste0('want_any_', to_check) := lapply(to_check, '%in%', value),
by = group]
df[, paste0('want_every_', to_check) :=
lapply(to_check, function(x) all(rollapply(value, 3, '%in%', x = x))),
by = group]
df
# group value want_any_C want_any_B want_every_C want_every_B
# 1: 1 A TRUE TRUE TRUE TRUE
# 2: 1 B TRUE TRUE TRUE TRUE
# 3: 1 C TRUE TRUE TRUE TRUE
# 4: 1 B TRUE TRUE TRUE TRUE
# 5: 1 A TRUE TRUE TRUE TRUE
# 6: 2 A FALSE TRUE FALSE TRUE
# 7: 2 A FALSE TRUE FALSE TRUE
# 8: 2 B FALSE TRUE FALSE TRUE
# 9: 4 D FALSE TRUE FALSE FALSE
# 10: 4 A FALSE TRUE FALSE FALSE
# 11: 4 A FALSE TRUE FALSE FALSE
# 12: 4 B FALSE TRUE FALSE FALSE
Or as yes/no或作为是/否
want_cols <- grep('want', names(df), value = T)
df[, (want_cols) := lapply(mget(want_cols), ifelse, 'yes', 'no')]
df
# group value want_any_C want_any_B want_every_C want_every_B
# 1: 1 A yes yes yes yes
# 2: 1 B yes yes yes yes
# 3: 1 C yes yes yes yes
# 4: 1 B yes yes yes yes
# 5: 1 A yes yes yes yes
# 6: 2 A no yes no yes
# 7: 2 A no yes no yes
# 8: 2 B no yes no yes
# 9: 4 D no yes no no
# 10: 4 A no yes no no
# 11: 4 A no yes no no
# 12: 4 B no yes no no
If you have millions of rows the rollapply approach might be slow.如果您有数百万行,rollapply 方法可能会很慢。 I don't think it's necessarry, there's probably a solution in checking
diff(which(value == 'C'))
(which I can't figure out at the moment).我不认为这是必要的,检查
diff(which(value == 'C'))
可能有一个解决方案(我目前无法弄清楚)。
Here is a base R solution, where you first define function want
as below这是一个基本的 R 解决方案,您首先在其中定义
want
函数,如下所示
want <- function(v,key,f) {
u <- sapply(seq(length(v)-2),function(k) key %in% v[k+0:2])
switch (f,
"any" = rep(ifelse(any(u),"Yes","No"),length(v)),
"every" = rep(ifelse(all(u),"Yes","No"),length(v))
)
}
and then you will get the desired output through the following code:然后您将通过以下代码获得所需的输出:
dfout <- cbind(df,do.call(rbind, c(make.row.names = F,
lapply(split(df,df$group), function(v) data.frame(
want_any_c = want(v$value,"C","any"),
want_any_b = want(v$value,"B","any"),
want_every_c = want(v$value,"C","every"),
want_every_b = want(v$value,"B","every"))))))
such that以至于
> dfout
group value want_any_c want_any_b want_every_c want_every_b
1 1 A Yes Yes Yes Yes
2 1 B Yes Yes Yes Yes
3 1 C Yes Yes Yes Yes
4 1 B Yes Yes Yes Yes
5 1 A Yes Yes Yes Yes
6 2 A No Yes No Yes
7 2 A No Yes No Yes
8 2 B No Yes No Yes
9 4 D No Yes No No
10 4 A No Yes No No
11 4 A No Yes No No
12 4 B No Yes No No
Base R, but doesn't require hardcoding indvidual values as vectors, and matching them etc:基础 R,但不需要将单个值硬编码为向量,并匹配它们等:
# Create a group of each grouping var every three rows:
n = 3
df$group2 <- paste0(df$group,
" - ",
ave(rep(1:n, ceiling(nrow(df)/n)),
rep(1:n, ceiling(nrow(df)/n)),
FUN = seq.int)[1:nrow(df)])
# Row-wise concatenate the unique values per group:
values_by_group <- aggregate(value~group2, df, FUN =
function(x){
paste0(unique(sort(x)),
collapse = ", ")})
# Add a vector per each unique value in df's value vector:
values_by_group <- cbind(values_by_group,
setNames(data.frame(matrix(NA, nrow = nrow(values_by_group),
ncol = length(unique(df$value)))),
c(unique(sapply(df$value, as.character)))))
# Store a vector of indices of values_by_group table
# matching the values in the original dataframe:
vec_idx <- names(values_by_group) %in% unique(sapply(df$value, as.character))
# Match vector names with values in value vector:
values_by_group[,vec_idx] <-
t(vapply(strsplit(as.character(values_by_group$value), ', '),
function(x){
names(values_by_group)[c(vec_idx)] %in% x
},
logical(ncol(values_by_group)-sum(!(vec_idx)))
)
)
# Merge with the original dataframe, drop unwanted grouping vec:
final_df <- within(merge(df,
values_by_group[,names(values_by_group) != "value"],
by = "group2",
all.x = TRUE), rm("group2"))
Data:数据:
df <- data.frame(group = c(1, 1,1, 1,1, 2, 2, 2, 4,4,4,4),
value = c("A","B","C","B","A","A","A","B","D","A","A","B"))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.