[英]how to filter a dataframe column based on intersected values from another column in dataframe
I have two dataframe.我有两个数据框。 I want to filter gene ID from expr_df dataframe based on the intersected values from another data frame gene_Annot.Basically i want to keep genes in expr_df that intersects with geneid from gene_Annot data.
我想根据来自另一个数据框gene_Annot 的相交值从expr_df 数据框中过滤基因ID。基本上我想将与gene_Annot 数据中的geneid 相交的基因保留在expr_df 中。 This is how my datasets look like:
这就是我的数据集的样子:
I tried this command in R:
我在 R 中尝试了这个命令:
expr_df <- expr_df %>% select(one_of(intersect(gene_annot$gene_id, colnames(expr_df))))
But then it gives me 0 or NA values for all the ID.但是它给了我所有 ID 的 0 或 NA 值。
The code you provided should work.您提供的代码应该可以工作。 However, the use of
intersect()
superfluous.但是,使用
intersect()
是多余的。 You should just remove that.你应该删除它。 Also
dplyr::one_of()
has been superseded and instead you should use dplyr::any_of()
.此外
dplyr::one_of()
已被取代,您应该使用dplyr::any_of()
。
library(tidyverse)
d1 <- tibble(sample_id = paste0("GTEX", 1:4),
ENSG1 = rnorm(4, 5, 1),
ENSG2 = rnorm(4, 50, 3),
ENSG3 = rnorm(4, 20, 7),
ENSG4 = rnorm(4, 3, 0.5))
d2 <- tibble(chr = 1:3, gene_id = c("ENSG1", "ENSG3", "ENSG4"))
d1 %>%
select(sample_id, any_of(d2$gene_id))
#> # A tibble: 4 × 4
#> sample_id ENSG1 ENSG3 ENSG4
#> <chr> <dbl> <dbl> <dbl>
#> 1 GTEX1 5.22 25.0 2.62
#> 2 GTEX2 5.99 -2.46 2.18
#> 3 GTEX3 4.56 22.0 3.29
#> 4 GTEX4 3.40 26.7 3.99
Created on 2022-07-18 by the reprex package (v2.0.1)由reprex 包(v2.0.1)于 2022-07-18 创建
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.