简体   繁体   English

如果一列值基于 R 数据帧中的另一列匹配,则过滤行

[英]filter rows if one column values matches based on another column in R data frame

As I am very new to R Programming, I need your help to find the answer由于我对 R 编程非常陌生,因此我需要您的帮助才能找到答案

I have the below data frame as input data, now I want to return the rows which are having the same EntryName but the Sequence is different我有以下数据框作为输入数据,现在我想返回具有相同 EntryName 但序列不同的行

EntryName条目名称 Entry入口 GeneNames基因名称 Organism生物 Length长度 Sequence序列 Postion职位
HXA13_HUMAN HXA13_HUMAN P31271 P31271 HOXA13 HOX HOXA13 HOX Human人类 388 388 AAAA AAAA 12 12
SOX21_HUMAN SOX21_人类 Q9Y651 Q9Y651 SOX21 SOX25 SOX21 SOX25 Human人类 276 276 AAAA AAAA 13 13
RBM24_HUMAN RBM24_HUMAN Q9BX46 Q9BX46 RBM24 RNPC6 RBM24 RNPC6 Human人类 236 236 AAAE美国航空航天局 14 14
MZT1_HUMAN MZT1_人类 Q08AG7 Q08AG7 MZT1 C13orf MZT1 C13orf Human人类 191 191 AAAK AAAK 15 15
HXA13_HUMAN HXA13_HUMAN P51589 P51589 HOXA13 HOXk HOXA13 HOXk Human人类 100 100 ABAB ABAB 120 120

Now I want to filter the rows for sequence AAAA and it should return the entire row where EntryName is matching with AAAA's EntryName for other Sequences现在我想过滤序列 AAAA 的行,它应该返回 EntryName 与其他序列的 AAAA 的 EntryName 匹配的整行

I am expecting the below output我期待下面的 output

EntryName条目名称 Entry入口 GeneNames基因名称 Organism生物 Length长度 Sequence序列 Postion职位
HXA13_HUMAN HXA13_HUMAN P31271 P31271 HOXA13 HOX HOXA13 HOX Human人类 388 388 AAAA AAAA 12 12
HXA13_HUMAN HXA13_HUMAN P51589 P51589 HOXA13 HOXk HOXA13 HOXk Human人类 100 100 ABAB ABAB 120 120

Along with the R script, MongoDB is also helpful Thank you so much in advance!除了 R 脚本之外,MongoDB 也很有帮助 提前非常感谢!

We could do a group by filter我们可以按filter分组

library(dplyr)
df1 %>%
    group_by(EntryName) %>%
    filter('AAAA' %in% Sequence) %>%
    ungroup

Or it could be或者它可能是

df1 %>%
    group_by(EntryName) %>%
    filter(n_distinct(Sequence) > 1) %>%
    ungroup

-output -输出

# A tibble: 2 × 7
  EntryName   Entry  GeneNames   Organism Length Sequence Postion
  <chr>       <chr>  <chr>       <chr>     <int> <chr>      <int>
1 HXA13_HUMAN P31271 HOXA13 HOX  Human       388 AAAA          12
2 HXA13_HUMAN P51589 HOXA13 HOXk Human       100 ABAB         120

data数据

df1 <- structure(list(EntryName = c("HXA13_HUMAN", "SOX21_HUMAN", "RBM24_HUMAN", 
"MZT1_HUMAN", "HXA13_HUMAN"), Entry = c("P31271", "Q9Y651", "Q9BX46", 
"Q08AG7", "P51589"), GeneNames = c("HOXA13 HOX", "SOX21 SOX25", 
"RBM24 RNPC6", "MZT1 C13orf", "HOXA13 HOXk"), Organism = c("Human", 
"Human", "Human", "Human", "Human"), Length = c(388L, 276L, 236L, 
191L, 100L), Sequence = c("AAAA", "AAAA", "AAAE", "AAAK", "ABAB"
), Postion = c(12L, 13L, 14L, 15L, 120L)), 
class = "data.frame", row.names = c(NA, 
-5L))

Base R:底座 R:

subset(df1, EntryName %in% unique(EntryName[Sequence == "AAAA"]))

 EntryName   Entry  GeneNames   Organism Length Sequence Postion
  <chr>       <chr>  <chr>       <chr>     <int> <chr>      <int>
1 HXA13_HUMAN P31271 HOXA13 HOX  Human       388 AAAA          12
2 SOX21_HUMAN Q9Y651 SOX21 SOX25 Human       276 AAAA          13
3 HXA13_HUMAN P51589 HOXA13 HOXk Human       100 ABAB         120

We could also use any :我们也可以使用any

library(dplyr)
df1 %>%
  group_by(EntryName) %>%
  filter(any(Sequence=="AAAA")) %>%
  ungroup

 EntryName   Entry  GeneNames   Organism Length Sequence Postion
  <chr>       <chr>  <chr>       <chr>     <int> <chr>      <int>
1 HXA13_HUMAN P31271 HOXA13 HOX  Human       388 AAAA          12
2 SOX21_HUMAN Q9Y651 SOX21 SOX25 Human       276 AAAA          13
3 HXA13_HUMAN P51589 HOXA13 HOXk Human       100 ABAB         120

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 R:从一个数据框中提取行,基于列名匹配另一个数据框中的值 - R: Extract Rows from One Data Frame, Based on Column Names Matching Values from Another Data Frame R:根据另一列操作一个数据框列的值 - R: Manipulate values of one data frame column based on another column R 故障排除:根据数据框中另一列中的值对数据框中的一列的值求和 - R Troubleshooting: Sum values of one column in a data frame based on values in another column of the data frame 根据行数不同的另一个数据框的值将值分配给一个数据框的列 - Assign values to a column of one data frame based on values of another data frame with different number of rows 根据另一个数据框中的列值从 R 中的数据框中删除行 - Delete rows from a data frame in R based on column values in another data frame 根据另一列值过滤数据框中的列 - Filter column in data frame based on another column values 根据R中另一列的值乘以数据框列的值 - Multiplying data frame column values based on the value of another column in R 如何使用多对列值过滤 R 数据框中的行 - How to filter rows in R data frame with multiple pairs of column values 基于另一列中的值对 R 数据帧中的行进行矢量化重新编码 - Vectorized recoding of rows in R data frame based on value in another column R:根据来自另一个数据框的匹配行更新列 - R: Update column based on matching rows from another data frame
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM