[英]Delete column containing two strings and a factor occurring once or twice in R
我有以下矩阵,其数字为 0 和 1,每列始终包含相同数量的字符串。 一列中的最小字符串数为 2。当它们同时满足两个条件时,我想删除列。
10
和01
),01
只出现一两次。 但我想保留所有其他列: r1 <- c("10","001","0001","01","100","10")
r2 <- c("01","001","0001","10","100","10")
r3 <- c("10","100","1000","10","010","01")
r4 <- c("10","010","0100","10","001","10")
r5<- c("01","010","0010","10","001","10")
r6<- c("01","010","0010","10","001","01")
n.mat <- rbind(r1,r2,r3,r4,r5,r6)
output:
r1 <- c("10","001","0001","100")
r2 <- c("01","001","0001","100")
r3 <- c("10","100","1000","010")
r4 <- c("10","010","0100","001")
r5<- c("01","010","0010","001")
r6<- c("01","010","0010","001")
n.mat <- rbind(r1,r2,r3,r4,r5,r6)
删除第 4 列和第 6 列。
到目前为止,我的代码是:
del_two<- function(x){
length(unique(x)) != 2
}
msa_protein.mat_1<-msa_protein.mat[, apply(msa_protein.mat, 2, del_two)]
但我不太确定如何添加 if function。
您可以添加&
以将逻辑选择与“AND”逻辑结合起来。 虽然在这种情况下我认为你想删除这些值而不是保留它们,所以你需要否定!
最终选择:
n.mat[, apply(n.mat, 2, FUN=function(x) !(length(unique(x)) == 2 & sum(x == '01') <= 2))]
甚至:
n.mat[, !apply(n.mat, 2, FUN=function(x) length(unique(x)) == 2 & sum(x == '01') <= 2)]
您也可以将其表达为逻辑条件失败,结合|
“或”逻辑:
n.mat[, apply(n.mat, 2, FUN=function(x) length(unique(x)) != 2 | sum(x == '01') > 2)]
所有给予:
# [,1] [,2] [,3] [,4]
#r1 "10" "001" "0001" "100"
#r2 "01" "001" "0001" "100"
#r3 "10" "100" "1000" "010"
#r4 "10" "010" "0100" "001"
#r5 "01" "010" "0010" "001"
#r6 "01" "010" "0010" "001"
使用列总和可能还有一些棘手的方法,如果您有大量数据,这可能会更快,例如:
n.mat[, !(
(colSums(n.mat == "01") <= 2) &
colSums(matrix(n.mat %in% c("10","01"), nrow=nrow(n.mat), ncol=ncol(n.mat))) == nrow(n.mat)
)]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.