[英]Subset Dataset in r based on three columns
In the following data frame for those records where id and name are same I want to remove those rows where class is 0在以下数据框中,对于那些 id 和 name 相同的记录,我想删除那些 class 为 0 的行
for eg 1st and 2nd record have same id and name.例如,第一条和第二条记录具有相同的 ID 和名称。 Similarly 3rd and 4th record.
同样的第 3 和第 4 记录。
The final data frame will be as below最终的数据框如下
Please help how to do it in r.请帮助如何在 r 中做到这一点。 My actual dataset has thousands of such records
我的实际数据集有数千条这样的记录
Here is the sample dataset这是示例数据集
Data <- data.frame(id = c(1,1,2,2,3,4,5),name = c("asd","asd","pqr","pqr","fgh","yut","kju"),
date = c("02/03/2022","10/05/2022","23/01/2022","15/04/2022","19/05/2022","14/02/2022","10/06/2022"),
class = c(0,1,0,1,0,0,1))
You may try,你可以试试,
library(dplyr)
Data %>%
group_by(id) %>%
filter(!(n() > 1 & class == 0))
id name date class
<dbl> <chr> <chr> <dbl>
1 1 asd 10/05/2022 1
2 2 pqr 15/04/2022 1
3 3 fgh 19/05/2022 0
4 4 yut 14/02/2022 0
5 5 kju 10/06/2022 1
Or an data.table
approach:或
data.table
方法:
library(data.table)
setDT(Data)
unique(Data[order(id, -class)], by="name")
Output:输出:
| id|name |date | class|
|--:|:----|:----------|-----:|
| 1|asd |10/05/2022 | 1|
| 2|pqr |15/04/2022 | 1|
| 3|fgh |19/05/2022 | 0|
| 4|yut |14/02/2022 | 0|
| 5|kju |10/06/2022 | 1|
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.