[英]filter rows if one column values matches based on another column in R data frame
As I am very new to R Programming, I need your help to find the answer由于我对 R 编程非常陌生,因此我需要您的帮助才能找到答案
I have the below data frame as input data, now I want to return the rows which are having the same EntryName but the Sequence is different我有以下数据框作为输入数据,现在我想返回具有相同 EntryName 但序列不同的行
EntryName![]() |
Entry![]() |
GeneNames![]() |
Organism![]() |
Length![]() |
Sequence![]() |
Postion![]() |
---|---|---|---|---|---|---|
HXA13_HUMAN ![]() |
P31271 ![]() |
HOXA13 HOX ![]() |
Human![]() |
388 ![]() |
AAAA ![]() |
12 ![]() |
SOX21_HUMAN ![]() |
Q9Y651 ![]() |
SOX21 SOX25 ![]() |
Human![]() |
276 ![]() |
AAAA ![]() |
13 ![]() |
RBM24_HUMAN ![]() |
Q9BX46 ![]() |
RBM24 RNPC6 ![]() |
Human![]() |
236 ![]() |
AAAE![]() |
14 ![]() |
MZT1_HUMAN ![]() |
Q08AG7 ![]() |
MZT1 C13orf ![]() |
Human![]() |
191 ![]() |
AAAK ![]() |
15 ![]() |
HXA13_HUMAN ![]() |
P51589 ![]() |
HOXA13 HOXk ![]() |
Human![]() |
100 ![]() |
ABAB ![]() |
120 ![]() |
Now I want to filter the rows for sequence AAAA and it should return the entire row where EntryName is matching with AAAA's EntryName for other Sequences现在我想过滤序列 AAAA 的行,它应该返回 EntryName 与其他序列的 AAAA 的 EntryName 匹配的整行
I am expecting the below output我期待下面的 output
EntryName![]() |
Entry![]() |
GeneNames![]() |
Organism![]() |
Length![]() |
Sequence![]() |
Postion![]() |
---|---|---|---|---|---|---|
HXA13_HUMAN ![]() |
P31271 ![]() |
HOXA13 HOX ![]() |
Human![]() |
388 ![]() |
AAAA ![]() |
12 ![]() |
HXA13_HUMAN ![]() |
P51589 ![]() |
HOXA13 HOXk ![]() |
Human![]() |
100 ![]() |
ABAB ![]() |
120 ![]() |
Along with the R script, MongoDB is also helpful Thank you so much in advance!除了 R 脚本之外,MongoDB 也很有帮助 提前非常感谢!
We could do a group by filter
我们可以按
filter
分组
library(dplyr)
df1 %>%
group_by(EntryName) %>%
filter('AAAA' %in% Sequence) %>%
ungroup
Or it could be或者它可能是
df1 %>%
group_by(EntryName) %>%
filter(n_distinct(Sequence) > 1) %>%
ungroup
-output -输出
# A tibble: 2 × 7
EntryName Entry GeneNames Organism Length Sequence Postion
<chr> <chr> <chr> <chr> <int> <chr> <int>
1 HXA13_HUMAN P31271 HOXA13 HOX Human 388 AAAA 12
2 HXA13_HUMAN P51589 HOXA13 HOXk Human 100 ABAB 120
df1 <- structure(list(EntryName = c("HXA13_HUMAN", "SOX21_HUMAN", "RBM24_HUMAN",
"MZT1_HUMAN", "HXA13_HUMAN"), Entry = c("P31271", "Q9Y651", "Q9BX46",
"Q08AG7", "P51589"), GeneNames = c("HOXA13 HOX", "SOX21 SOX25",
"RBM24 RNPC6", "MZT1 C13orf", "HOXA13 HOXk"), Organism = c("Human",
"Human", "Human", "Human", "Human"), Length = c(388L, 276L, 236L,
191L, 100L), Sequence = c("AAAA", "AAAA", "AAAE", "AAAK", "ABAB"
), Postion = c(12L, 13L, 14L, 15L, 120L)),
class = "data.frame", row.names = c(NA,
-5L))
Base R:底座 R:
subset(df1, EntryName %in% unique(EntryName[Sequence == "AAAA"]))
EntryName Entry GeneNames Organism Length Sequence Postion
<chr> <chr> <chr> <chr> <int> <chr> <int>
1 HXA13_HUMAN P31271 HOXA13 HOX Human 388 AAAA 12
2 SOX21_HUMAN Q9Y651 SOX21 SOX25 Human 276 AAAA 13
3 HXA13_HUMAN P51589 HOXA13 HOXk Human 100 ABAB 120
We could also use any
:我们也可以使用
any
:
library(dplyr)
df1 %>%
group_by(EntryName) %>%
filter(any(Sequence=="AAAA")) %>%
ungroup
EntryName Entry GeneNames Organism Length Sequence Postion
<chr> <chr> <chr> <chr> <int> <chr> <int>
1 HXA13_HUMAN P31271 HOXA13 HOX Human 388 AAAA 12
2 SOX21_HUMAN Q9Y651 SOX21 SOX25 Human 276 AAAA 13
3 HXA13_HUMAN P51589 HOXA13 HOXk Human 100 ABAB 120
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.