繁体   English   中英

R 中的滞后 function 应用于仅产生 NA 的分组数据帧

[英]lag function in R applied on grouped data frame producing only NA

我正在尝试计算数据框中给定客户的学期之间访问次数之间的差异,但是在使用 dplyr function 滞后时,我只得到 NA,我使用的代码是;

DF %>% group_by(IdCliente_F, Semestre) %>% 
       summarise(Numero_Visitas = n(),
                 Per_Change_Frequency = Numero_Visitas - dplyr::lag(Numero_Visitas))

我得到的结果如下

# A tibble: 12,656 x 4
# Groups:   IdCliente_F [6,192]
   IdCliente_F Semestre   Numero_Visitas Per_Change_Frequency
         <dbl> <date>              <int>                <int>
 1           0 2019-07-01             22                   NA
 2      A      2019-01-01              1                   NA
 3      B      2019-07-01              1                   NA
 4      C      2019-01-01              9                   NA
 5      C      2021-01-01              3                   NA
 6      C      2021-07-01              1                   NA
 7      D      2021-07-01              1                   NA
 8      E      2019-07-01              3                   NA
 9      E      2020-01-01              1                   NA
10      E      2020-07-01              5                   NA

我期待以下内容:

# A tibble: 12,656 x 4
# Groups:   IdCliente_F [6,192]
   IdCliente_F Semestre   Numero_Visitas Per_Change_Frequency
         <dbl> <date>              <int>                <int>
 1           0 2019-07-01             22                   NA
 2      A      2019-01-01              1                   NA
 3      B      2019-07-01              1                   NA
 4      C      2019-01-01              9                   NA
 5      C      2021-01-01              3                   -6
 6      C      2021-07-01              1                   -2
 7      D      2021-07-01              1                   NA
 8      E      2019-07-01              3                   NA
 9      E      2020-01-01              1                   -2
10      E      2020-07-01              5                    4

我很感激任何帮助。 提前致谢

如果您一次只查看一个Semestre ,则无法计算Semestre之间的差异。 试试group_by(IdClient_F)

您还可以减少计算(不需要lag )以仅使用diff

DF %>%
  group_by(IdCliente_F) %>%
  mutate(Numero_Visitas = n(), Per_Change_Frequency = c(NA, diff(Numero_Visitas))) %>%
  ungroup()
# # A tibble: 10 x 4
#    IdCliente_F Semestre   Numero_Visitas Per_Change_Frequency
#    <chr>       <chr>               <int>                <int>
#  1 0           2019-07-01              1                   NA
#  2 A           2019-01-01              1                   NA
#  3 B           2019-07-01              1                   NA
#  4 C           2019-01-01              3                   NA
#  5 C           2021-01-01              3                    0
#  6 C           2021-07-01              3                    0
#  7 D           2021-07-01              1                   NA
#  8 E           2019-07-01              3                   NA
#  9 E           2020-01-01              3                    0
# 10 E           2020-07-01              3                    0

在这种情况下,由于每个IdCliente_F样本数据只有 1-3 个观察值,因此几乎看不到结果,但该过程应该适用于更完整的数据。


数据

DF <- structure(list(IdCliente_F = c("0", "A", "B", "C", "C", "C", "D", "E", "E", "E"), Semestre = c("2019-07-01", "2019-01-01", "2019-07-01", "2019-01-01", "2021-01-01", "2021-07-01", "2021-07-01", "2019-07-01", "2020-01-01", "2020-07-01"), Numero_Visitas = c(22L, 1L, 1L, 9L, 3L, 1L, 1L, 3L, 1L, 5L), Per_Change_Frequency = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA)), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10"))

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM