简体   繁体   English

(R) 计算几个月之间的差距

[英](R) Count Gaps in Between Months

I have a dataset that looks like the sample below.我有一个数据集,看起来像下面的示例。 The MonthYear variable just describes the first occurrence of the UniqueID. MonthYear 变量只描述第一次出现的 UniqueID。 The Count1 column counts the number of occurrences of the UniqueID. Count1 列计算 UniqueID 出现的次数。 The Count2 column counts the total number of occurrences of each UniqueID. Count2 列计算每个 UniqueID 出现的总次数。 The MonthCount variable just assigns a number based on the month. MonthCount 变量只是根据月份分配一个数字。 I started collecting this information in October 2018, so that value would get a 1 and then November 2018 would get a 2, and so on.我于 2018 年 10 月开始收集此信息,因此该值将获得 1,然后 2018 年 11 月将获得 2,依此类推。

Note: No data was available in both July 2019 and October 2019, so August 2019 gets a value of 10 and November 2019 gets a value of 12.注意:2019 年 7 月和 2019 年 10 月都没有可用数据,因此 2019 年 8 月的值为 10,2019 年 11 月的值为 12。

UniqueID Region City MonthYear Count1 Count2 MonthCount
ABC123   West   AAA  OCT-18    1      4      1
ABC123   West   AAA  NOV-18    2      4      2
ABC123   West   AAA  DEC-18    3      4      3 
ABC123   West   AAA  JAN-19    4      4      4
DEF456   East   BBB  DEC-18    1      3      3 
DEF456   East   BBB  JAN-19    2      3      4
DEF456   East   BBB  MAR-19    3      3      6
GHI789   East   CCC  JAN-19    1      4      4
GHI789   East   CCC  FEB-19    2      4      5
GHI789   East   CCC  APR-19    3      4      7
GHI789   East   CCC  JUN-19    4      4      9
JKL012   South  DDD  AUG-19    1      4      10 
JKL012   South  DDD  SEP-19    2      4      11
JKL012   South  DDD  NOV-19    3      4      12
JKL012   South  DDD  DEC-19    4      4      13 

What I want to do is count the total number of times a month is skipped per UniqueID (with the exception of JUL19 and OCT19).我想要做的是计算每个 UniqueID 每月跳过的总次数(JUL19 和 OCT19 除外)。 I would have a dataset that looks like the following:我会有一个如下所示的数据集:

UniqueID Region City MonthYear Count1 Count2 MonthCount Skipped
ABC123   West   AAA  OCT-18    1      4      1          0      
ABC123   West   AAA  NOV-18    2      4      2          0
ABC123   West   AAA  DEC-18    3      4      3          0
ABC123   West   AAA  JAN-19    4      4      4          0
DEF456   East   BBB  DEC-18    1      3      3          1
DEF456   East   BBB  JAN-19    2      3      4          1
DEF456   East   BBB  MAR-19    3      3      6          1 
GHI789   East   CCC  JAN-19    1      4      4          2
GHI789   East   CCC  FEB-19    2      4      5          2
GHI789   East   CCC  APR-19    3      4      7          2
GHI789   East   CCC  JUN-19    4      4      9          2
JKL012   South  DDD  AUG-19    1      4      10         0 
JKL012   South  DDD  SEP-19    2      4      11         0
JKL012   South  DDD  NOV-19    3      4      12         0
JKL012   South  DDD  DEC-19    4      4      13         0

Any help would be appreciated!任何帮助,将不胜感激! I'm not sure where to start.我不知道从哪里开始。 Thank you!谢谢!

After grouping by 'UniqueID', can get the diff of the 'MonthCount', check if any value is greater than 1 ie there is a difference in adjacent months greater than 1, and sum the logical vector按'UniqueID'分组后,可以得到'MonthCount'的diff ,检查是否有大于1的值,即相邻月份的差值大于1,并sum逻辑vector

df1 %>% 
   group_by(UniqueID) %>%
   mutate(Skipped = sum(diff(MonthCount) > 1))
# A tibble: 15 x 8
# Groups:   UniqueID [4]
#   UniqueID Region City  MonthYear Count1 Count2 MonthCount Skipped
#   <chr>    <chr>  <chr> <chr>      <int>  <int>      <int>   <int>
# 1 ABC123   West   AAA   OCT-18         1      4          1       0
# 2 ABC123   West   AAA   NOV-18         2      4          2       0
# 3 ABC123   West   AAA   DEC-18         3      4          3       0
# 4 ABC123   West   AAA   JAN-19         4      4          4       0
# 5 DEF456   East   BBB   DEC-18         1      3          3       1
# 6 DEF456   East   BBB   JAN-19         2      3          4       1
# 7 DEF456   East   BBB   MAR-19         3      3          6       1
# 8 GHI789   East   CCC   JAN-19         1      4          4       2
# 9 GHI789   East   CCC   FEB-19         2      4          5       2
#10 GHI789   East   CCC   APR-19         3      4          7       2
#11 GHI789   East   CCC   JUN-19         4      4          9       2
#12 JKL012   South  DDD   AUG-19         1      4         10       0
#13 JKL012   South  DDD   SEP-19         2      4         11       0
#14 JKL012   South  DDD   NOV-19         3      4         12       0
#15 JKL012   South  DDD   DEC-19         4      4         13       0

data数据

df1 <- structure(list(UniqueID = c("ABC123", "ABC123", "ABC123", "ABC123", 
"DEF456", "DEF456", "DEF456", "GHI789", "GHI789", "GHI789", "GHI789", 
"JKL012", "JKL012", "JKL012", "JKL012"), Region = c("West", "West", 
"West", "West", "East", "East", "East", "East", "East", "East", 
"East", "South", "South", "South", "South"), City = c("AAA", 
"AAA", "AAA", "AAA", "BBB", "BBB", "BBB", "CCC", "CCC", "CCC", 
"CCC", "DDD", "DDD", "DDD", "DDD"), MonthYear = c("OCT-18", "NOV-18", 
"DEC-18", "JAN-19", "DEC-18", "JAN-19", "MAR-19", "JAN-19", "FEB-19", 
"APR-19", "JUN-19", "AUG-19", "SEP-19", "NOV-19", "DEC-19"), 
    Count1 = c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 1L, 2L, 3L, 4L, 1L, 
    2L, 3L, 4L), Count2 = c(4L, 4L, 4L, 4L, 3L, 3L, 3L, 4L, 4L, 
    4L, 4L, 4L, 4L, 4L, 4L), MonthCount = c(1L, 2L, 3L, 4L, 3L, 
    4L, 6L, 4L, 5L, 7L, 9L, 10L, 11L, 12L, 13L)), class = "data.frame",
    row.names = c(NA, 
-15L))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM