[英]Filter one table based on another in R
I have one table(1)
that looks like this (it is an all by all distance matrix transformed
into a tab separated list): 我有一个看起来像这样的
table(1)
(这是一个全部距离matrix transformed
为制表符分隔的列表):
sample1 sample2 405
sample3 sample4 400
sample5 sample6 1
sample7 sample8 20
sample1 sample3 40
I have another table(2)
which contains those samples which meet a certain criteria: 我还有另一个
table(2)
,其中包含满足特定条件的那些样本:
sample1
sample2
sample8
How can I parse the first table(1)
to extract only those rows in which both the variables in columns 1
and 2
can be found in table(2)
? 如何解析第一个
table(1)
仅提取那些在table(2)
可以找到第1
列和第2
列变量的行?
ie desired comparisons are only: 即所需的比较只是:
sample1 sample2 405
sample2 sample8 40
sample8 sample1 100
I tried a similar set-up using a dataframe for your table(1) and a vector for your table(2). 我尝试过使用表(1)的数据帧和表(2)的向量的类似设置。
table_one <- data.frame(col_1 = c("a", "b", "c", "d"),
col_2 = c("b", "d", "f", "g"),
col_3 = c(1, 2, 3, 4))
table_two <- c("b", "d")
When you set it up that way, something like this should work: 当您以这种方式进行设置时,应该可以执行以下操作:
library(tidyverse)
table_one %>% filter(col_1 %in% table_two,
col_2 %in% table_two)
Here is a base R solution: 这是基本的R解决方案:
rawData1 <- "first second distance
sample1 sample2 405
sample3 sample4 400
sample5 sample6 1
sample7 sample8 20
sample1 sample3 40"
rawData2 <- "sample
sample1
sample2
sample8"
a <- read.table(textConnection(rawData1),stringsAsFactors=FALSE,header=TRUE)
b <- read.table(textConnection(rawData2),stringsAsFactors=FALSE,header=TRUE)
a[a$first %in% b$sample & a$second %in% b$sample, ]
...and the output: ...以及输出:
> a[a$first %in% b$sample & a$second %in% b$sample, ]
first second distance
1 sample1 sample2 405
The best option could be inner_join
twice, both with 1st column and 2nd column and then perform intersect
of two result set. 最好的选择可能是
inner_join
两次,分别与第一列和第二列,然后执行两个结果集的intersect
。
library(dplyr)
df1 <- read.table(text = "Samp1 Samp2 Val
sample1 sample2 405
sample3 sample4 400
sample5 sample6 1
sample7 sample8 20
sample1 sample3 40", header = TRUE, stringsAsFactors = FALSE)
> df1
Samp1 Samp2 Val
1 sample1 sample2 405
2 sample3 sample4 400
3 sample5 sample6 1
4 sample7 sample8 20
5 sample1 sample3 40
df2 <- data.frame(Samp = c("sample1",
"sample2",
"sample8"), stringsAsFactors = FALSE)
> df2
Samp
1 sample1
2 sample2
3 sample8
#use inner_join between Samp1 with Samp and then again Samp2 with Samp
intersect(inner_join(df1,df2, by = c("Samp1" = "Samp")),
inner_join(df1,df2, by = c("Samp2" = "Samp")))
The result will be:
Samp1 Samp2 Val
1 sample1 sample2 405
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.