简体   繁体   English

考虑 R 中数据框中的条件提取行

[英]Extract rows considering a condition in a data frame in R

I want to extract the rows if all the columns are zero.如果所有列都为零,我想提取行。 In the first column of df , there is a list of gene IDs that are repeated per individuals in the second column.df的第一列中,有一个基因 ID 列表,在第二列中每个个体重复。 I want to extract gene IDs if the df[,3:length(df)] are all zero for all individuals in the next column.如果下一列中所有个体的df[,3:length(df)]都为零,我想提取基因 ID。

> dim(df)
[1] 1040675      56

> df
ID     INDV     tra1   tr2   tr3  tra2   tr15   tr1b  
ENS777   1       1.2    0     0   1.6    3.3    0
ENS777   2       1.2    0     0   1.6    3.3    0
ENS777   3       1.2    0     0   1.6    3.3    0
ENS777   4       1.2    0     0   1.6    3.3    0
ENS999   1        0     0     0    0      0     0
ENS999   2        0     0     0    0      0     0
ENS999   3        0     0     0    0      0     0
ENS999   4        0     0     0    0      0     0
ENS888   1       1.2    0     0   1.6    3.3    0
ENS888   2       1.2    0     0   1.6    3.3    0
ENS888   3       1.2    0     0   1.6    3.3    0
ENS888   4       1.2    0     0   1.6    3.3    0

So, the out put would be ENS999 in this case.因此,在这种情况下,输出将是ENS999

If I understand correctly, if all the columns are zero for all individuals.如果我理解正确,如果所有个人的所有列都为零。

You can try this:你可以试试这个:

library(data.table)

df <- fread("Book2.csv")
> df
        ID INDV tra1 tr2 tr3 tra2 tr15 tr1b colsum
 1: ENS777    1  1.2   0   0  1.6  3.3    0    6.1
 2: ENS777    2  1.2   0   0  1.6  3.3    0    6.1
 3: ENS777    3  1.2   0   0  1.6  3.3    0    6.1
 4: ENS777    4  1.2   0   0  1.6  3.3    0    6.1
 5: ENS999    1  0.0   0   0  0.0  0.0    0    0.0
 6: ENS999    2  0.0   0   0  0.0  0.0    0    0.0
 7: ENS999    3  0.0   0   0  0.0  0.0    0    0.0
 8: ENS999    4  0.0   0   0  0.0  0.0    0    0.0
 9: ENS888    1  1.2   0   0  1.6  3.3    0    6.1
10: ENS888    2  1.2   0   0  1.6  3.3    0    6.1
11: ENS888    3  1.2   0   0  1.6  3.3    0    6.1
12: ENS888    4  1.2   0   0  1.6  3.3    0    6.1

#take the column sums
df[, colsum := tra1+tr2+tr3+tra2+tr15+tr1b]

#take the row sums by ID
df[, rowsum := sum(colsum), by= ID]

> df
        ID INDV tra1 tr2 tr3 tra2 tr15 tr1b colsum rowsum
 1: ENS777    1  1.2   0   0  1.6  3.3    0    6.1   24.4
 2: ENS777    2  1.2   0   0  1.6  3.3    0    6.1   24.4
 3: ENS777    3  1.2   0   0  1.6  3.3    0    6.1   24.4
 4: ENS777    4  1.2   0   0  1.6  3.3    0    6.1   24.4
 5: ENS999    1  0.0   0   0  0.0  0.0    0    0.0    0.0
 6: ENS999    2  0.0   0   0  0.0  0.0    0    0.0    0.0
 7: ENS999    3  0.0   0   0  0.0  0.0    0    0.0    0.0
 8: ENS999    4  0.0   0   0  0.0  0.0    0    0.0    0.0
 9: ENS888    1  1.2   0   0  1.6  3.3    0    6.1   24.4
10: ENS888    2  1.2   0   0  1.6  3.3    0    6.1   24.4
11: ENS888    3  1.2   0   0  1.6  3.3    0    6.1   24.4
12: ENS888    4  1.2   0   0  1.6  3.3    0    6.1   24.4

# Now you need the unique IDs for which the row sums are zero 
> unique(df[rowsum==0]$ID)
[1] "ENS999"


Using dplyr -使用dplyr -

library(dplyr)

df %>%
  group_by(ID) %>%
  filter(all(unlist(dplyr::select(cur_data(), tra1:tr1b) == 0))) %>%
  ungroup %>%
  distinct(ID)

#   ID    
#  <chr> 
#1 ENS999

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM