I am trying to calculate the difference between the number of visits between semesters for a given client in a data frame but I am getting only NA when using the dplyr function lag, the code I am using is;
DF %>% group_by(IdCliente_F, Semestre) %>%
summarise(Numero_Visitas = n(),
Per_Change_Frequency = Numero_Visitas - dplyr::lag(Numero_Visitas))
The result I am getting is the following
# A tibble: 12,656 x 4
# Groups: IdCliente_F [6,192]
IdCliente_F Semestre Numero_Visitas Per_Change_Frequency
<dbl> <date> <int> <int>
1 0 2019-07-01 22 NA
2 A 2019-01-01 1 NA
3 B 2019-07-01 1 NA
4 C 2019-01-01 9 NA
5 C 2021-01-01 3 NA
6 C 2021-07-01 1 NA
7 D 2021-07-01 1 NA
8 E 2019-07-01 3 NA
9 E 2020-01-01 1 NA
10 E 2020-07-01 5 NA
And I am expecting the following:
# A tibble: 12,656 x 4
# Groups: IdCliente_F [6,192]
IdCliente_F Semestre Numero_Visitas Per_Change_Frequency
<dbl> <date> <int> <int>
1 0 2019-07-01 22 NA
2 A 2019-01-01 1 NA
3 B 2019-07-01 1 NA
4 C 2019-01-01 9 NA
5 C 2021-01-01 3 -6
6 C 2021-07-01 1 -2
7 D 2021-07-01 1 NA
8 E 2019-07-01 3 NA
9 E 2020-01-01 1 -2
10 E 2020-07-01 5 4
I appreciate any help. Thanks in advance
You cannot calculate the difference between Semestre
if you are only looking at one Semestre
at a time. Try just group_by(IdClient_F)
.
You can also reduce your calculation (no need for lag
) to use just diff
.
DF %>%
group_by(IdCliente_F) %>%
mutate(Numero_Visitas = n(), Per_Change_Frequency = c(NA, diff(Numero_Visitas))) %>%
ungroup()
# # A tibble: 10 x 4
# IdCliente_F Semestre Numero_Visitas Per_Change_Frequency
# <chr> <chr> <int> <int>
# 1 0 2019-07-01 1 NA
# 2 A 2019-01-01 1 NA
# 3 B 2019-07-01 1 NA
# 4 C 2019-01-01 3 NA
# 5 C 2021-01-01 3 0
# 6 C 2021-07-01 3 0
# 7 D 2021-07-01 1 NA
# 8 E 2019-07-01 3 NA
# 9 E 2020-01-01 3 0
# 10 E 2020-07-01 3 0
In this case, there are very few results to see since the sample data only has 1-3 observations per IdCliente_F
, but the process should work for fuller data.
Data
DF <- structure(list(IdCliente_F = c("0", "A", "B", "C", "C", "C", "D", "E", "E", "E"), Semestre = c("2019-07-01", "2019-01-01", "2019-07-01", "2019-01-01", "2021-01-01", "2021-07-01", "2021-07-01", "2019-07-01", "2020-01-01", "2020-07-01"), Numero_Visitas = c(22L, 1L, 1L, 9L, 3L, 1L, 1L, 3L, 1L, 5L), Per_Change_Frequency = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA)), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10"))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.