简体   繁体   English

如果单行满足条件,则使用data.table标记整个组

[英]Flag an entire group if a single row meets a condition using data.table

I have the following sample data: 我有以下示例数据:

> so <- data.table(Credit_id = rep(c("1-A", "17-F", "2-D"), each = 3), Period = rep(1:3, times = 3), Due_days = c(0,0,0, 0,30,0, 0,30,60))
> so
   Credit_id Period Due_days
1:       1-A      1        0
2:       1-A      2        0
3:       1-A      3        0
4:      17-F      1        0
5:      17-F      2       30
6:      17-F      3        0
7:       2-D      1        0
8:       2-D      2       30
9:       2-D      3       60

The data shows how three different credits performed during their first three months in a portfolio. 数据显示了在投资组合的前三个月中,三种不同的信用表现如何。 Credit_id is the main key, Period is a time index and Due_days shows how many days a client was overdue in a given period. Credit_id是主键, Period是时间索引, Due_days显示在给定时间段内客户的逾期天数。

I want to create a new column Flag which can take two values: 0 and 1. Flag should take the value 1 if a credit (which is grouped by Credit_id ) was ever equal to or greater than 30. 我想创建一个新列Flag ,它可以采用两个值:0和1。如果一个贷项(由Credit_id分组)的Credit_id曾经等于或大于30,则Flag的值应为1。

This is the result I want to get to: 这是我想要得到的结果:

   Credit_id Period Due_days Flag
1:       1-A      1        0    0
2:       1-A      2        0    0
3:       1-A      3        0    0
4:      17-F      1        0    1
5:      17-F      2       30    1
6:      17-F      3        0    1
7:       2-D      1        0    1
8:       2-D      2       30    1
9:       2-D      3       60    1

That is, assign a 1 to the groups who had at least one row where Due_days >= 30 . 也就是说,将1分配给在Due_days >= 30时至少有一行的组。

You can do: 你可以做:

dt[, flag := +(any(Due_days >= 30)), by = Credit_id]

   Credit_id Period Due_days flag
1:       1-A      1        0    0
2:       1-A      2        0    0
3:       1-A      3        0    0
4:      17-F      1        0    1
5:      17-F      2       30    1
6:      17-F      3        0    1
7:       2-D      1        0    1
8:       2-D      2       30    1
9:       2-D      3       60    1

Or the same with base R : 或与base R相同:

with(dt, ave(Due_days, Credit_id, FUN = function(x) +(any(x >= 30))))

any() tests whether at least one value per group fulfills the condition. any()测试每组至少一个值是否满足条件。 As @Calum You already noted, + is just a quick way to transform a logical vector into a vector of integers. 正如@Calum您已经提到的, +只是将逻辑向量转换为整数向量的一种快速方法。

To illustrate the use of + : 为了说明+的用法:

+(c(TRUE, FALSE))
[1] 1 0

Other possibilities are: 其他可能性是:

c(TRUE, FALSE) * 1

Or: 要么:

as.integer(c(TRUE, FALSE))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM