繁体   English   中英

删除包含两个字符串和一个因子在 R 中出现一次或两次的列

[英]Delete column containing two strings and a factor occurring once or twice in R

我有以下矩阵,其数字为 0 和 1,每列始终包含相同数量的字符串。 一列中的最小字符串数为 2。当它们同时满足两个条件时,我想删除列。

  1. 仅包含两个字符串( 1001 ),
  2. 如果01只出现一两次。 但我想保留所有其他列:
    r1 <- c("10","001","0001","01","100","10")
    r2 <- c("01","001","0001","10","100","10")
    r3 <- c("10","100","1000","10","010","01")
    r4 <- c("10","010","0100","10","001","10")
    r5<- c("01","010","0010","10","001","10")
    r6<- c("01","010","0010","10","001","01")
    
    n.mat <- rbind(r1,r2,r3,r4,r5,r6)

output:

    r1 <- c("10","001","0001","100")
    r2 <- c("01","001","0001","100")
    r3 <- c("10","100","1000","010")
    r4 <- c("10","010","0100","001")
    r5<- c("01","010","0010","001")
    r6<- c("01","010","0010","001")
    
    n.mat <- rbind(r1,r2,r3,r4,r5,r6)

删除第 4 列和第 6 列。

到目前为止,我的代码是:

del_two<- function(x){
  length(unique(x)) != 2
}
msa_protein.mat_1<-msa_protein.mat[, apply(msa_protein.mat, 2, del_two)] 

但我不太确定如何添加 if function。

您可以添加&以将逻辑选择与“AND”逻辑结合起来。 虽然在这种情况下我认为你想删除这些值而不是保留它们,所以你需要否定! 最终选择:

n.mat[, apply(n.mat, 2, FUN=function(x) !(length(unique(x)) == 2 & sum(x == '01') <= 2))]

甚至:

n.mat[, !apply(n.mat, 2, FUN=function(x) length(unique(x)) == 2 & sum(x == '01') <= 2)]

您也可以将其表达为逻辑条件失败,结合| “或”逻辑:

n.mat[, apply(n.mat, 2, FUN=function(x) length(unique(x)) != 2 | sum(x == '01') > 2)]

所有给予:

#   [,1] [,2]  [,3]   [,4] 
#r1 "10" "001" "0001" "100"
#r2 "01" "001" "0001" "100"
#r3 "10" "100" "1000" "010"
#r4 "10" "010" "0100" "001"
#r5 "01" "010" "0010" "001"
#r6 "01" "010" "0010" "001"

使用列总和可能还有一些棘手的方法,如果您有大量数据,这可能会更快,例如:

n.mat[, !(
 (colSums(n.mat == "01") <= 2) &
 colSums(matrix(n.mat %in% c("10","01"), nrow=nrow(n.mat), ncol=ncol(n.mat))) == nrow(n.mat)
)]

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM