在R中基于另一个表过滤一个表

Question

我有一个看起来像这样的table(1) （这是一个全部距离matrix transformed为制表符分隔的列表）：

sample1    sample2    405
sample3    sample4    400
sample5    sample6    1
sample7    sample8    20
sample1    sample3    40

我还有另一个table(2) ，其中包含满足特定条件的那些样本：

sample1
sample2
sample8

如何解析第一个table(1)仅提取那些在table(2)可以找到第1列和第2列变量的行？

即所需的比较只是：

sample1    sample2    405
sample2    sample8    40
sample8    sample1    100

Answer 1

我尝试过使用表（1）的数据帧和表（2）的向量的类似设置。

table_one <- data.frame(col_1 = c("a", "b", "c", "d"),
       col_2 = c("b", "d", "f", "g"),
       col_3 = c(1, 2, 3, 4))
table_two <- c("b", "d")

当您以这种方式进行设置时，应该可以执行以下操作：

library(tidyverse)
table_one %>% filter(col_1 %in% table_two,
                     col_2 %in% table_two)

Answer 2

这是基本的R解决方案：

rawData1 <- "first second distance
 sample1    sample2    405
 sample3    sample4    400
 sample5    sample6    1
 sample7    sample8    20
 sample1    sample3    40"

rawData2 <- "sample
 sample1
 sample2
 sample8"

a <- read.table(textConnection(rawData1),stringsAsFactors=FALSE,header=TRUE)
b <- read.table(textConnection(rawData2),stringsAsFactors=FALSE,header=TRUE)

a[a$first %in% b$sample & a$second %in% b$sample, ]

...以及输出：

> a[a$first %in% b$sample & a$second %in% b$sample, ]
    first  second distance
1 sample1 sample2      405

Answer 3

最好的选择可能是inner_join两次，分别与第一列和第二列，然后执行两个结果集的intersect 。

library(dplyr)

df1 <- read.table(text = "Samp1 Samp2  Val
sample1    sample2    405
sample3    sample4    400
sample5    sample6    1
sample7    sample8    20
sample1    sample3    40", header = TRUE, stringsAsFactors = FALSE)
> df1
    Samp1   Samp2 Val
1 sample1 sample2 405
2 sample3 sample4 400
3 sample5 sample6   1
4 sample7 sample8  20
5 sample1 sample3  40

df2 <- data.frame(Samp = c("sample1",
                           "sample2",
                           "sample8"), stringsAsFactors = FALSE)
> df2
     Samp
1 sample1
2 sample2
3 sample8

#use inner_join between Samp1 with Samp and then again Samp2 with Samp
intersect(inner_join(df1,df2, by = c("Samp1" = "Samp")),
      inner_join(df1,df2, by = c("Samp2" = "Samp")))

The result will be:
    Samp1   Samp2 Val
1 sample1 sample2 405

在R中基于另一个表过滤一个表

问题描述

3 个解决方案

解决方案1
2 2017-12-28 22:47:46

解决方案2
2 已采纳 2017-12-28 22:50:56

解决方案3
1 2017-12-28 23:03:34

在R中基于另一个表过滤一个表

问题描述

3 个解决方案

解决方案1 2 2017-12-28 22:47:46

解决方案2 2 已采纳 2017-12-28 22:50:56

解决方案3 1 2017-12-28 23:03:34

解决方案1
2 2017-12-28 22:47:46

解决方案2
2 已采纳 2017-12-28 22:50:56

解决方案3
1 2017-12-28 23:03:34