[英]Flagging data within groups in R dataframe
I have the following table in R dataframe 我在R数据帧中有下表
I would like to write the logic that generates the "keep" column. 我想编写生成“保持”列的逻辑。 For each person I would like to flag accounts that has a transaction newer than 4 days, since first access.
对于每个人,我想标记自首次访问以来交易时间超过4天的帐户。 So the first line is a new account for this person so flag it.
所以第一行是这个人的新帐户,所以标记它。 The second line the dates are only 2 days apart so keep it too.
第二行日期只有2天,所以保持它。 The third line is 11 days since we first saw this account so we do NOT flag it.
第三行是我们第一次看到这个帐号后的11天,所以我们不会标记它。 The same logic goes for the next person.
同样的逻辑适用于下一个人。 Flag only accounts that is less than 4 days old.
仅标记不到4天的帐户。
I have rebuilt your data frame, try this solution: 我重建了你的数据框,尝试这个解决方案:
library(lubridate)
library(dplyr)
df <- data.frame(Person = c(rep("abc",3), rep("eee", 5)),
date = c("4/1/2016", "4/3/2016", "4/12/2016", "5/3/2016", "5/4/2016","5/4/2016","5/6/2016", "5/10/2016"),
account = c("123","123","123","222","222","333","222","333"), stringsAsFactors = F)
df$date2 <- mdy(df$date)
The best solution, as suggested by @thelatemail: @thelatemail建议的最佳解决方案:
df %>%
group_by(Person) %>%
mutate(keep=as.numeric(date2 - first(date2) <= 4)) %>%
select(-date2)
Result: 结果:
Person date account keep
1 abc 4/1/2016 123 1
2 abc 4/3/2016 123 1
3 abc 4/12/2016 123 0
4 eee 5/3/2016 222 1
5 eee 5/4/2016 222 1
6 eee 5/4/2016 333 1
7 eee 5/6/2016 222 1
8 eee 5/10/2016 333 0
My more convoluted original solution (useful if the account creation date is not in the first line for each person): 我更复杂的原始解决方案(如果帐户创建日期不在每个人的第一行,则非常有用):
df %>%
group_by(Person) %>%
slice(which.min(date2)) %>%
select(Person, date2) %>%
rename(account_create = date2) %>%
merge(df, ., by = "Person") %>%
mutate(keep = as.numeric(date2 - account_create <= 4)) %>%
select(-c(date2, account_create))
Using data.table
: 使用
data.table
:
library(data.table)
setDT(df)[, Keep:=as.numeric(difftime(date,first(date),units="days") < 4), by=Person][]
We group by Person
and then create the column Keep
using the condition that the date
is less than 4
days from the first(date)
for the Person
. 我们按
Person
,然后创建柱Keep
使用状态的date
小于4
从天first(date)
的Person
。
Here, we assume that the date
column is a date-time
object. 在这里,我们假设
date
列是date-time
对象。 If the date
column is read in as character strings, then we can do the conversion using: 如果将
date
列作为字符串读入,那么我们可以使用以下命令进行转换:
df$date <- as.POSIXct(df$date, format="%m/%d/%Y")
With the data given by: 随着数据给出:
df <- structure(list(Person = c("abc", "abc", "abc", "eee", "eee",
"eee", "eee", "eee"), date = structure(c(1459483200, 1459656000,
1460433600, 1462248000, 1462334400, 1462334400, 1462507200, 1462852800
), class = c("POSIXct", "POSIXt"), tzone = ""), account = c(123L,
123L, 123L, 222L, 222L, 333L, 222L, 333L)), .Names = c("Person",
"date", "account"), row.names = c(NA, -8L), class = "data.frame")
The result is: 结果是:
## Person date account Keep
##1 abc 2016-04-01 123 1
##2 abc 2016-04-03 123 1
##3 abc 2016-04-12 123 0
##4 eee 2016-05-03 222 1
##5 eee 2016-05-04 222 1
##6 eee 2016-05-04 333 1
##7 eee 2016-05-06 222 1
##8 eee 2016-05-10 333 0
Thanks for these great ideas; 感谢这些伟大的想法; R is amazing, doing this relatively complicated accounting in four lines of code.
R是惊人的,在四行代码中进行相对复杂的会计。 Another thing I did not emphasize is that I also need to keep track whether it is a new account or not.
我没有强调的另一件事是我还需要跟踪它是否是一个新帐户。 Also since this data is not necessarily sorted, I sorted it first, so here is the final version.
此外,由于这些数据不一定排序,我先排序,所以这是最终版本。
df %>%
arrange(Person,account) %>%
group_by(Person,account) %>%
mutate(keep=as.numeric(date2 - first(date2) <4)) %>%
select(-date2)
Result: 结果:
Person date account keep
<chr> <chr> <chr> <dbl>
1 abc 4/1/2016 123 1
2 abc 4/3/2016 123 1
3 abc 4/12/2016 123 0
4 eee 5/3/2016 222 1
5 eee 5/4/2016 222 1
6 eee 5/6/2016 222 1
7 eee 5/10/2016 333 1
8 eee 5/11/2016 333 1
So we keep the last line since it is only 1 day from when the 333 account first showed up. 所以我们保留最后一行,因为距离333帐户首次出现仅一天。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.