考虑 R 中数据框中的条件提取行

Question

I want to extract the rows if all the columns are zero.如果所有列都为零，我想提取行。 In the first column of df , there is a list of gene IDs that are repeated per individuals in the second column.在df的第一列中，有一个基因 ID 列表，在第二列中每个个体重复。 I want to extract gene IDs if the df[,3:length(df)] are all zero for all individuals in the next column.如果下一列中所有个体的df[,3:length(df)]都为零，我想提取基因 ID。

> dim(df)
[1] 1040675      56

> df
ID     INDV     tra1   tr2   tr3  tra2   tr15   tr1b  
ENS777   1       1.2    0     0   1.6    3.3    0
ENS777   2       1.2    0     0   1.6    3.3    0
ENS777   3       1.2    0     0   1.6    3.3    0
ENS777   4       1.2    0     0   1.6    3.3    0
ENS999   1        0     0     0    0      0     0
ENS999   2        0     0     0    0      0     0
ENS999   3        0     0     0    0      0     0
ENS999   4        0     0     0    0      0     0
ENS888   1       1.2    0     0   1.6    3.3    0
ENS888   2       1.2    0     0   1.6    3.3    0
ENS888   3       1.2    0     0   1.6    3.3    0
ENS888   4       1.2    0     0   1.6    3.3    0

So, the out put would be ENS999 in this case.因此，在这种情况下，输出将是ENS999 。

Answer 1

If I understand correctly, if all the columns are zero for all individuals.如果我理解正确，如果所有个人的所有列都为零。

You can try this:你可以试试这个：

library(data.table)

df <- fread("Book2.csv")
> df
        ID INDV tra1 tr2 tr3 tra2 tr15 tr1b colsum
 1: ENS777    1  1.2   0   0  1.6  3.3    0    6.1
 2: ENS777    2  1.2   0   0  1.6  3.3    0    6.1
 3: ENS777    3  1.2   0   0  1.6  3.3    0    6.1
 4: ENS777    4  1.2   0   0  1.6  3.3    0    6.1
 5: ENS999    1  0.0   0   0  0.0  0.0    0    0.0
 6: ENS999    2  0.0   0   0  0.0  0.0    0    0.0
 7: ENS999    3  0.0   0   0  0.0  0.0    0    0.0
 8: ENS999    4  0.0   0   0  0.0  0.0    0    0.0
 9: ENS888    1  1.2   0   0  1.6  3.3    0    6.1
10: ENS888    2  1.2   0   0  1.6  3.3    0    6.1
11: ENS888    3  1.2   0   0  1.6  3.3    0    6.1
12: ENS888    4  1.2   0   0  1.6  3.3    0    6.1

#take the column sums
df[, colsum := tra1+tr2+tr3+tra2+tr15+tr1b]

#take the row sums by ID
df[, rowsum := sum(colsum), by= ID]

> df
        ID INDV tra1 tr2 tr3 tra2 tr15 tr1b colsum rowsum
 1: ENS777    1  1.2   0   0  1.6  3.3    0    6.1   24.4
 2: ENS777    2  1.2   0   0  1.6  3.3    0    6.1   24.4
 3: ENS777    3  1.2   0   0  1.6  3.3    0    6.1   24.4
 4: ENS777    4  1.2   0   0  1.6  3.3    0    6.1   24.4
 5: ENS999    1  0.0   0   0  0.0  0.0    0    0.0    0.0
 6: ENS999    2  0.0   0   0  0.0  0.0    0    0.0    0.0
 7: ENS999    3  0.0   0   0  0.0  0.0    0    0.0    0.0
 8: ENS999    4  0.0   0   0  0.0  0.0    0    0.0    0.0
 9: ENS888    1  1.2   0   0  1.6  3.3    0    6.1   24.4
10: ENS888    2  1.2   0   0  1.6  3.3    0    6.1   24.4
11: ENS888    3  1.2   0   0  1.6  3.3    0    6.1   24.4
12: ENS888    4  1.2   0   0  1.6  3.3    0    6.1   24.4

# Now you need the unique IDs for which the row sums are zero 
> unique(df[rowsum==0]$ID)
[1] "ENS999"

Answer 2

Using dplyr -使用dplyr -

library(dplyr)

df %>%
  group_by(ID) %>%
  filter(all(unlist(dplyr::select(cur_data(), tra1:tr1b) == 0))) %>%
  ungroup %>%
  distinct(ID)

#   ID    
#  <chr> 
#1 ENS999

考虑 R 中数据框中的条件提取行

问题描述

2 个解决方案

解决方案1
1 2021-07-26 13:07:17

解决方案2
0 已采纳 2021-07-26 13:26:24

考虑 R 中数据框中的条件提取行

问题描述

2 个解决方案

解决方案1 1 2021-07-26 13:07:17

解决方案2 0 已采纳 2021-07-26 13:26:24

解决方案1
1 2021-07-26 13:07:17

解决方案2
0 已采纳 2021-07-26 13:26:24