简体   繁体   English

基于三列的r中的子集数据集

[英]Subset Dataset in r based on three columns

In the following data frame for those records where id and name are same I want to remove those rows where class is 0在以下数据框中,对于那些 id 和 name 相同的记录,我想删除那些 class 为 0 的行

在此处输入图像描述

for eg 1st and 2nd record have same id and name.例如,第一条和第二条记录具有相同的 ID 和名称。 Similarly 3rd and 4th record.同样的第 3 和第 4 记录。

The final data frame will be as below最终的数据框如下

在此处输入图像描述

Please help how to do it in r.请帮助如何在 r 中做到这一点。 My actual dataset has thousands of such records我的实际数据集有数千条这样的记录

Here is the sample dataset这是示例数据集

Data <- data.frame(id = c(1,1,2,2,3,4,5),name = c("asd","asd","pqr","pqr","fgh","yut","kju"),
           date = c("02/03/2022","10/05/2022","23/01/2022","15/04/2022","19/05/2022","14/02/2022","10/06/2022"),
           class = c(0,1,0,1,0,0,1)) 

You may try,你可以试试,

library(dplyr)
Data %>%
  group_by(id) %>%
  filter(!(n() > 1 &  class == 0))

     id name  date       class
  <dbl> <chr> <chr>      <dbl>
1     1 asd   10/05/2022     1
2     2 pqr   15/04/2022     1
3     3 fgh   19/05/2022     0
4     4 yut   14/02/2022     0
5     5 kju   10/06/2022     1

Or an data.table approach:data.table方法:

library(data.table)

setDT(Data)
unique(Data[order(id, -class)], by="name")

Output:输出:

| id|name |date       | class|
|--:|:----|:----------|-----:|
|  1|asd  |10/05/2022 |     1|
|  2|pqr  |15/04/2022 |     1|
|  3|fgh  |19/05/2022 |     0|
|  4|yut  |14/02/2022 |     0|
|  5|kju  |10/06/2022 |     1|

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM