Flagging data within groups in R dataframe

Question

I have the following table in R dataframe

I would like to write the logic that generates the "keep" column. For each person I would like to flag accounts that has a transaction newer than 4 days, since first access. So the first line is a new account for this person so flag it. The second line the dates are only 2 days apart so keep it too. The third line is 11 days since we first saw this account so we do NOT flag it. The same logic goes for the next person. Flag only accounts that is less than 4 days old.

Answer 1

I have rebuilt your data frame, try this solution:

library(lubridate)
library(dplyr)

df <- data.frame(Person = c(rep("abc",3), rep("eee", 5)),
           date = c("4/1/2016", "4/3/2016", "4/12/2016", "5/3/2016", "5/4/2016","5/4/2016","5/6/2016", "5/10/2016"),
           account = c("123","123","123","222","222","333","222","333"), stringsAsFactors = F)

df$date2 <- mdy(df$date)

The best solution, as suggested by @thelatemail:

df %>% 
group_by(Person) %>% 
mutate(keep=as.numeric(date2 - first(date2) <= 4)) %>% 
select(-date2)

Result:

 Person      date account keep
1    abc  4/1/2016     123    1
2    abc  4/3/2016     123    1
3    abc 4/12/2016     123    0
4    eee  5/3/2016     222    1
5    eee  5/4/2016     222    1
6    eee  5/4/2016     333    1
7    eee  5/6/2016     222    1
8    eee 5/10/2016     333    0

My more convoluted original solution (useful if the account creation date is not in the first line for each person):

df %>% 
group_by(Person) %>% 
slice(which.min(date2)) %>%
select(Person, date2) %>%
rename(account_create = date2) %>%
merge(df, ., by = "Person") %>%
mutate(keep = as.numeric(date2 - account_create <= 4)) %>%
select(-c(date2, account_create))

Answer 2

Using data.table :

library(data.table)
setDT(df)[, Keep:=as.numeric(difftime(date,first(date),units="days") < 4), by=Person][]

We group by Person and then create the column Keep using the condition that the date is less than 4 days from the first(date) for the Person .

Here, we assume that the date column is a date-time object. If the date column is read in as character strings, then we can do the conversion using:

df$date <- as.POSIXct(df$date, format="%m/%d/%Y")

With the data given by:

df <- structure(list(Person = c("abc", "abc", "abc", "eee", "eee", 
"eee", "eee", "eee"), date = structure(c(1459483200, 1459656000, 
1460433600, 1462248000, 1462334400, 1462334400, 1462507200, 1462852800
), class = c("POSIXct", "POSIXt"), tzone = ""), account = c(123L, 
123L, 123L, 222L, 222L, 333L, 222L, 333L)), .Names = c("Person", 
"date", "account"), row.names = c(NA, -8L), class = "data.frame")

The result is:

##  Person       date account  Keep
##1    abc 2016-04-01     123     1
##2    abc 2016-04-03     123     1
##3    abc 2016-04-12     123     0
##4    eee 2016-05-03     222     1
##5    eee 2016-05-04     222     1
##6    eee 2016-05-04     333     1
##7    eee 2016-05-06     222     1
##8    eee 2016-05-10     333     0

Answer 3

Thanks for these great ideas; R is amazing, doing this relatively complicated accounting in four lines of code. Another thing I did not emphasize is that I also need to keep track whether it is a new account or not. Also since this data is not necessarily sorted, I sorted it first, so here is the final version.

    df %>% 
      arrange(Person,account) %>%
      group_by(Person,account) %>% 
      mutate(keep=as.numeric(date2 - first(date2) <4)) %>% 
      select(-date2)

Result:

    Person      date account  keep
    <chr>     <chr>   <chr> <dbl>
1    abc  4/1/2016     123     1
2    abc  4/3/2016     123     1
3    abc 4/12/2016     123     0
4    eee  5/3/2016     222     1
5    eee  5/4/2016     222     1
6    eee  5/6/2016     222     1
7    eee 5/10/2016     333     1
8    eee 5/11/2016     333     1

So we keep the last line since it is only 1 day from when the 333 account first showed up.

Flagging data within groups in R dataframe

Question

3 answers

solution1
1 2016-09-01 22:35:00

solution2
1 2016-09-01 22:41:06

solution3
1 2016-09-02 13:41:05

Flagging data within groups in R dataframe

Question

3 answers

solution1 1 2016-09-01 22:35:00

solution2 1 2016-09-01 22:41:06

solution3 1 2016-09-02 13:41:05

solution1
1 2016-09-01 22:35:00

solution2
1 2016-09-01 22:41:06

solution3
1 2016-09-02 13:41:05