简体   繁体   English

在R中基于另一个表过滤一个表

[英]Filter one table based on another in R

I have one table(1) that looks like this (it is an all by all distance matrix transformed into a tab separated list): 我有一个看起来像这样的table(1) (这是一个全部距离matrix transformed为制表符分隔的列表):

sample1    sample2    405
sample3    sample4    400
sample5    sample6    1
sample7    sample8    20
sample1    sample3    40

I have another table(2) which contains those samples which meet a certain criteria: 我还有另一个table(2) ,其中包含满足特定条件的那些样本:

sample1
sample2
sample8

How can I parse the first table(1) to extract only those rows in which both the variables in columns 1 and 2 can be found in table(2) ? 如何解析第一个table(1)仅提取那些在table(2)可以找到第1列和第2列变量的行?

ie desired comparisons are only: 即所需的比较只是:

sample1    sample2    405
sample2    sample8    40
sample8    sample1    100

I tried a similar set-up using a dataframe for your table(1) and a vector for your table(2). 我尝试过使用表(1)的数据帧和表(2)的向量的类似设置。

table_one <- data.frame(col_1 = c("a", "b", "c", "d"),
       col_2 = c("b", "d", "f", "g"),
       col_3 = c(1, 2, 3, 4))
table_two <- c("b", "d")

When you set it up that way, something like this should work: 当您以这种方式进行设置时,应该可以执行以下操作:

library(tidyverse)
table_one %>% filter(col_1 %in% table_two,
                     col_2 %in% table_two)

Here is a base R solution: 这是基本的R解决方案:

rawData1 <- "first second distance
 sample1    sample2    405
 sample3    sample4    400
 sample5    sample6    1
 sample7    sample8    20
 sample1    sample3    40"

rawData2 <- "sample
 sample1
 sample2
 sample8"

a <- read.table(textConnection(rawData1),stringsAsFactors=FALSE,header=TRUE)
b <- read.table(textConnection(rawData2),stringsAsFactors=FALSE,header=TRUE)

a[a$first %in% b$sample & a$second %in% b$sample, ]

...and the output: ...以及输出:

> a[a$first %in% b$sample & a$second %in% b$sample, ]
    first  second distance
1 sample1 sample2      405

The best option could be inner_join twice, both with 1st column and 2nd column and then perform intersect of two result set. 最好的选择可能是inner_join两次,分别与第一列和第二列,然后执行两个结果集的intersect

library(dplyr)

df1 <- read.table(text = "Samp1 Samp2  Val
sample1    sample2    405
sample3    sample4    400
sample5    sample6    1
sample7    sample8    20
sample1    sample3    40", header = TRUE, stringsAsFactors = FALSE)
> df1
    Samp1   Samp2 Val
1 sample1 sample2 405
2 sample3 sample4 400
3 sample5 sample6   1
4 sample7 sample8  20
5 sample1 sample3  40

df2 <- data.frame(Samp = c("sample1",
                           "sample2",
                           "sample8"), stringsAsFactors = FALSE)
> df2
     Samp
1 sample1
2 sample2
3 sample8

#use inner_join between Samp1 with Samp and then again Samp2 with Samp
intersect(inner_join(df1,df2, by = c("Samp1" = "Samp")),
      inner_join(df1,df2, by = c("Samp2" = "Samp")))

The result will be:
    Samp1   Samp2 Val
1 sample1 sample2 405

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在 R 中,根据与另一个表的匹配和日期小于另一个表来过滤一个表的行 - In R filter rows of one table based off of matches to another table AND date less than other table R:一种基于另一个表中的值进行过滤的方法? - R: A way to filter based on values in another table? 如何根据 R 中一个表中的两列之间的依赖关系和另一个表的结果过滤结果? - How to filter results based on dependencies between two columns in one table and results from another table in R? R:基于另一个表更新一个表 - R: Updating one table based on another 根据另一列中的共享项目从一列中过滤项目 - filter items from one column based on shared items in another R 根据r中另一个数据框中的列过滤一个数据框中的行 - Filter rows in one dataframe based on columns in another dataframe in r 根据另一个表过滤表 - Filter a Table based on another Table 在R中的data.table中基于另一个因素汇总一个因素 - Aggregating one factor based on another in data.table in R R - 基于另一个加速子集数据表? - R - Speed up subsetting data table based on another one? 根据 R 中一列下另一个数据帧值的最后两位数字过滤一个 dataframe - Filter one dataframe based on the last two digits of another dataframe's value under one column in R
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM