[英]apply conditional on two groups of columns within dataframe
我有一個df:
a<-c(5,1,5,3,5,3,5,1)
b<-c(1,5,1,5,1,5,3,5)
df<-as.data.frame(rbind(a,b))
names(df)<-c('pre1','post1','pre2','post2','pre3','post3','pre4','post4')
我在列中有兩組樣本,例如“ pre”和“ post”:
pre<-seq(1,8,by=2)
post<-seq(2,8,by=2)
我想應用一個條件,即合格前的100%和職務后的50%或合格前的50%和職務的100%
例如
如果'pre'的100%為3或以上,且50%的post為3或以上,則保留行,或者如果'pre'的50%為3或以上,且100%的post為3或以上,則保留行,因此在示例df中僅此行“ a”會留下
我有:
test<- ((df[apply(df[pre],1,function(x) sum(x>=3)/length(x)),] &
df[apply(df[post],1,function(x) sum(x>3)/length(x))>=0.5,]) |
(df[apply(df[pre],1,function(x) sum(x>3)/length(x))>=0.5,] &
df[apply(df[post],1,function(x) sum(x>3)/length(x)),]))
但是我得到了一個“ TRUE”的向量,這不是我想要的。
我們可以創建一個邏輯向量以使用rowSums
進行比較
df[(rowSums(df[pre] >= 3)/length(pre) == 1) &
(rowSums(df[post] >= 3)/length(post) >= 0.5) |
(rowSums(df[post] >= 3)/length(post) == 1) &
(rowSums(df[pre] >= 3)/length(pre) >= 0.5), ]
# pre1 post1 pre2 post2 pre3 post3 pre4 post4
#a 5 1 5 3 5 3 5 1
使用apply
我們可以做到
df[apply(df[pre] >= 3, 1, all) & apply(df[post] >= 3, 1, sum)/length(post) >= 0.5 |
apply(df[post] >= 3, 1, all) & apply(df[pre] >= 3, 1, sum)/length(pre) >= 0.5, ]
這是一個不太簡潔的tidyverse解決方案,可能會大大縮短。
library(tidyverse)
pass_val = 3
df %>%
rownames_to_column() %>%
gather(col, val, -rowname) %>%
separate("col", c("type", "num"), sep = -1) %>%
count(rowname, type, pass = val >= pass_val) %>%
spread(pass, n, fill = 0) %>%
transmute(rowname, type, pass_pct = `TRUE`/(`TRUE` + `FALSE`)) %>%
spread(type, pass_pct) %>%
filter(post == 1 & pre >= 0.5 | post >= 0.5 & pre == 1)
這是tidyverse
一種選擇
library(tidyverse)
library(rap)
crossing(val = c(0.5, 1), cols = c("pre", "post")) %>%
rap(x = ~ df %>%
select(matches(cols)) %>%
{rowMeans(. >=3) >= val}) %>%
group_by(val) %>%
transmute(ind = reduce(x, `&`)) %>%
filter(any(ind)) %>%
pull(ind) %>%
filter(df, .)
# pre1 post1 pre2 post2 pre3 post3 pre4 post4
#1 5 1 5 3 5 3 5 1
這是一個基本的R解決方案,它按行名稱進行sapply
,使用sapply
檢查條件,並將輸出用作df上的邏輯索引:
df[sapply(split(df, rownames(df)), function(x) {
(sum(x[pre] > 2)/ncol(x[pre]) >= .5) & (sum(x[post] > 2)/ncol(x[post]) == 1) ||
(sum(x[pre] > 2)/ncol(x[pre]) == 1) & (sum(x[post] > 2)/ncol(x[post]) >= .5)
}),]
#### OUTPUT ####
pre1 post1 pre2 post2 pre3 post3 pre4 post4
a 5 1 5 3 5 3 5 1
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.