简体   繁体   English

用每个样本列的特定值过滤掉行

[英]Filter rows out with specific value for each sample column

Keep rows (Obs) which Obs value is over the threshold value of the sample IN AT LEAST THREE SAMPLES.将 Obs 值超过样本阈值的行 (Obs) 保留在至少三个样本中。 Remove rows which have 2 or less.删除 2 或更少的行。

ie. IE。

  • Obs 1 has only S5 over the threshold so would be filteresd out; Obs 1 只有 S5 超过阈值,因此将被过滤掉;
  • Obs2 has 4 and Obs 3 has 3 so they would remain in the df. Obs2 有 4 个,Obs 3 有 3 个,因此它们将保留在 df 中。

. .

df <- data.frame(column=c("threshold", "Obs1", "Obs2", "Obs3"), S1 = c(1.7,1.4,1.9,1.3), S2= c(0.9,0.8,2,1), S3=c(2.5,2.4,2.1,0.5), S4=c(0.4,0.5,0.6,0.9), S5=c(1.2,1.4,1.3,1.6))
 df

    column      S1  S2  S3  S4  S5
    threshold  1.7 0.9 2.5 0.4 1.2 
    Obs1       1.4 0.8 2.4 0.5 1.4 
    Obs2       1.9 2.0 2.1 0.6 1.3
    Obs3       1.3 1.0 0.5 0.9 1.6

Desired output:所需的 output:

column      S1  S2  S3  S4  S5
 
Obs2       1.9 2.0 2.1 0.6 1.3
Obs3       1.3 1.0 0.5 0.9 1.6

I do not know how to code it but I wonder if using some logic like this:我不知道如何编码,但我想知道是否使用这样的逻辑:

logic <- if df (S1-5)>= threshold value then =1; if df (S1-5) < threhold then = 0 

library(dplyr)
logic  %>% rowwise %%
    filter(sum(c_across(where(is.numeric))) >= 3) %>%
    ungroup

If we are using rowwise with c_across , just slice the rows without the 'threshold' row and then do the comparison > with the corresponding slice d dataset with 'threshold' rows如果我们将rowwisec_across一起使用,只需对没有“阈值”行的行进行slice ,然后与具有“阈值”行的相应slice d 数据集进行比较>

library(dplyr)
df %>% 
    slice(-1) %>% 
    rowwise %>%
    filter(sum(c_across(where(is.numeric)) 
                > 
                  (df %>% 
                      slice(1) %>%
                   select(-1))) >=3) %>%
    ungroup

-output -输出

# A tibble: 2 x 6
#  column    S1    S2    S3    S4    S5
#  <chr>  <dbl> <dbl> <dbl> <dbl> <dbl>
#1 Obs2     1.9     2   2.1   0.6   1.3
#2 Obs3     1.3     1   0.5   0.9   1.6

If there are other character columns as well, we can change the select for subset data如果还有其他字符列,我们可以将子集数据改为select

df  %>% 
   slice(-1) %>%
   rowwise %>% 
   filter(sum(c_across(where(is.numeric)) > df %>%
             slice(1) %>%
             select(where(is.numeric))) >=3)

Or another option with map或者map的另一个选项

library(purrr)
library(magrittr)
i1 <- map(df %>%
              select(where(is.numeric)),  ~ .x[-1] >  first(.x)) %>% 
        reduce(`+`) %>% 
        is_greater_than(2)
df %>% 
     slice(-1) %>% 
     filter(i1)

Or using base R with rowSums或者使用带 rowSums 的base R rowSums

df[-1,][rowSums(df[-1, -1] > df[1, -1][col(df[-1, -1])]) >=3,]
#  column  S1 S2  S3  S4  S5
#3   Obs2 1.9  2 2.1 0.6 1.3
#4   Obs3 1.3  1 0.5 0.9 1.6

For future reference: If you work with columns which are character, you need to make sure the columns with the values are numeric, if not, transform them供将来参考:如果您使用字符列,则需要确保具有值的列是数字,如果不是,请转换它们

df <- type.convert(df, as.is = TRUE) 

and then, this should work然后,这应该工作

df2 <- df %>% slice(-1) %>% rowwise %>% filter(sum(c_across(where(is.numeric)) > (df %>%slice(1))) >=3)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM