简体   繁体   中英

lag function in R applied on grouped data frame producing only NA

I am trying to calculate the difference between the number of visits between semesters for a given client in a data frame but I am getting only NA when using the dplyr function lag, the code I am using is;

DF %>% group_by(IdCliente_F, Semestre) %>% 
       summarise(Numero_Visitas = n(),
                 Per_Change_Frequency = Numero_Visitas - dplyr::lag(Numero_Visitas))

The result I am getting is the following

# A tibble: 12,656 x 4
# Groups:   IdCliente_F [6,192]
   IdCliente_F Semestre   Numero_Visitas Per_Change_Frequency
         <dbl> <date>              <int>                <int>
 1           0 2019-07-01             22                   NA
 2      A      2019-01-01              1                   NA
 3      B      2019-07-01              1                   NA
 4      C      2019-01-01              9                   NA
 5      C      2021-01-01              3                   NA
 6      C      2021-07-01              1                   NA
 7      D      2021-07-01              1                   NA
 8      E      2019-07-01              3                   NA
 9      E      2020-01-01              1                   NA
10      E      2020-07-01              5                   NA

And I am expecting the following:

# A tibble: 12,656 x 4
# Groups:   IdCliente_F [6,192]
   IdCliente_F Semestre   Numero_Visitas Per_Change_Frequency
         <dbl> <date>              <int>                <int>
 1           0 2019-07-01             22                   NA
 2      A      2019-01-01              1                   NA
 3      B      2019-07-01              1                   NA
 4      C      2019-01-01              9                   NA
 5      C      2021-01-01              3                   -6
 6      C      2021-07-01              1                   -2
 7      D      2021-07-01              1                   NA
 8      E      2019-07-01              3                   NA
 9      E      2020-01-01              1                   -2
10      E      2020-07-01              5                    4

I appreciate any help. Thanks in advance

You cannot calculate the difference between Semestre if you are only looking at one Semestre at a time. Try just group_by(IdClient_F) .

You can also reduce your calculation (no need for lag ) to use just diff .

DF %>%
  group_by(IdCliente_F) %>%
  mutate(Numero_Visitas = n(), Per_Change_Frequency = c(NA, diff(Numero_Visitas))) %>%
  ungroup()
# # A tibble: 10 x 4
#    IdCliente_F Semestre   Numero_Visitas Per_Change_Frequency
#    <chr>       <chr>               <int>                <int>
#  1 0           2019-07-01              1                   NA
#  2 A           2019-01-01              1                   NA
#  3 B           2019-07-01              1                   NA
#  4 C           2019-01-01              3                   NA
#  5 C           2021-01-01              3                    0
#  6 C           2021-07-01              3                    0
#  7 D           2021-07-01              1                   NA
#  8 E           2019-07-01              3                   NA
#  9 E           2020-01-01              3                    0
# 10 E           2020-07-01              3                    0

In this case, there are very few results to see since the sample data only has 1-3 observations per IdCliente_F , but the process should work for fuller data.


Data

DF <- structure(list(IdCliente_F = c("0", "A", "B", "C", "C", "C", "D", "E", "E", "E"), Semestre = c("2019-07-01", "2019-01-01", "2019-07-01", "2019-01-01", "2021-01-01", "2021-07-01", "2021-07-01", "2019-07-01", "2020-01-01", "2020-07-01"), Numero_Visitas = c(22L, 1L, 1L, 9L, 3L, 1L, 1L, 3L, 1L, 5L), Per_Change_Frequency = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA)), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10"))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM