[英](R) Count Gaps in Between Months
I have a dataset that looks like the sample below.我有一个数据集,看起来像下面的示例。 The MonthYear variable just describes the first occurrence of the UniqueID.
MonthYear 变量只描述第一次出现的 UniqueID。 The Count1 column counts the number of occurrences of the UniqueID.
Count1 列计算 UniqueID 出现的次数。 The Count2 column counts the total number of occurrences of each UniqueID.
Count2 列计算每个 UniqueID 出现的总次数。 The MonthCount variable just assigns a number based on the month.
MonthCount 变量只是根据月份分配一个数字。 I started collecting this information in October 2018, so that value would get a 1 and then November 2018 would get a 2, and so on.
我于 2018 年 10 月开始收集此信息,因此该值将获得 1,然后 2018 年 11 月将获得 2,依此类推。
Note: No data was available in both July 2019 and October 2019, so August 2019 gets a value of 10 and November 2019 gets a value of 12.注意:2019 年 7 月和 2019 年 10 月都没有可用数据,因此 2019 年 8 月的值为 10,2019 年 11 月的值为 12。
UniqueID Region City MonthYear Count1 Count2 MonthCount
ABC123 West AAA OCT-18 1 4 1
ABC123 West AAA NOV-18 2 4 2
ABC123 West AAA DEC-18 3 4 3
ABC123 West AAA JAN-19 4 4 4
DEF456 East BBB DEC-18 1 3 3
DEF456 East BBB JAN-19 2 3 4
DEF456 East BBB MAR-19 3 3 6
GHI789 East CCC JAN-19 1 4 4
GHI789 East CCC FEB-19 2 4 5
GHI789 East CCC APR-19 3 4 7
GHI789 East CCC JUN-19 4 4 9
JKL012 South DDD AUG-19 1 4 10
JKL012 South DDD SEP-19 2 4 11
JKL012 South DDD NOV-19 3 4 12
JKL012 South DDD DEC-19 4 4 13
What I want to do is count the total number of times a month is skipped per UniqueID (with the exception of JUL19 and OCT19).我想要做的是计算每个 UniqueID 每月跳过的总次数(JUL19 和 OCT19 除外)。 I would have a dataset that looks like the following:
我会有一个如下所示的数据集:
UniqueID Region City MonthYear Count1 Count2 MonthCount Skipped
ABC123 West AAA OCT-18 1 4 1 0
ABC123 West AAA NOV-18 2 4 2 0
ABC123 West AAA DEC-18 3 4 3 0
ABC123 West AAA JAN-19 4 4 4 0
DEF456 East BBB DEC-18 1 3 3 1
DEF456 East BBB JAN-19 2 3 4 1
DEF456 East BBB MAR-19 3 3 6 1
GHI789 East CCC JAN-19 1 4 4 2
GHI789 East CCC FEB-19 2 4 5 2
GHI789 East CCC APR-19 3 4 7 2
GHI789 East CCC JUN-19 4 4 9 2
JKL012 South DDD AUG-19 1 4 10 0
JKL012 South DDD SEP-19 2 4 11 0
JKL012 South DDD NOV-19 3 4 12 0
JKL012 South DDD DEC-19 4 4 13 0
Any help would be appreciated!任何帮助,将不胜感激! I'm not sure where to start.
我不知道从哪里开始。 Thank you!
谢谢!
After grouping by 'UniqueID', can get the diff
of the 'MonthCount', check if any value is greater than 1 ie there is a difference in adjacent months greater than 1, and sum
the logical vector
按'UniqueID'分组后,可以得到'MonthCount'的
diff
,检查是否有大于1的值,即相邻月份的差值大于1,并sum
逻辑vector
df1 %>%
group_by(UniqueID) %>%
mutate(Skipped = sum(diff(MonthCount) > 1))
# A tibble: 15 x 8
# Groups: UniqueID [4]
# UniqueID Region City MonthYear Count1 Count2 MonthCount Skipped
# <chr> <chr> <chr> <chr> <int> <int> <int> <int>
# 1 ABC123 West AAA OCT-18 1 4 1 0
# 2 ABC123 West AAA NOV-18 2 4 2 0
# 3 ABC123 West AAA DEC-18 3 4 3 0
# 4 ABC123 West AAA JAN-19 4 4 4 0
# 5 DEF456 East BBB DEC-18 1 3 3 1
# 6 DEF456 East BBB JAN-19 2 3 4 1
# 7 DEF456 East BBB MAR-19 3 3 6 1
# 8 GHI789 East CCC JAN-19 1 4 4 2
# 9 GHI789 East CCC FEB-19 2 4 5 2
#10 GHI789 East CCC APR-19 3 4 7 2
#11 GHI789 East CCC JUN-19 4 4 9 2
#12 JKL012 South DDD AUG-19 1 4 10 0
#13 JKL012 South DDD SEP-19 2 4 11 0
#14 JKL012 South DDD NOV-19 3 4 12 0
#15 JKL012 South DDD DEC-19 4 4 13 0
df1 <- structure(list(UniqueID = c("ABC123", "ABC123", "ABC123", "ABC123",
"DEF456", "DEF456", "DEF456", "GHI789", "GHI789", "GHI789", "GHI789",
"JKL012", "JKL012", "JKL012", "JKL012"), Region = c("West", "West",
"West", "West", "East", "East", "East", "East", "East", "East",
"East", "South", "South", "South", "South"), City = c("AAA",
"AAA", "AAA", "AAA", "BBB", "BBB", "BBB", "CCC", "CCC", "CCC",
"CCC", "DDD", "DDD", "DDD", "DDD"), MonthYear = c("OCT-18", "NOV-18",
"DEC-18", "JAN-19", "DEC-18", "JAN-19", "MAR-19", "JAN-19", "FEB-19",
"APR-19", "JUN-19", "AUG-19", "SEP-19", "NOV-19", "DEC-19"),
Count1 = c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 1L, 2L, 3L, 4L, 1L,
2L, 3L, 4L), Count2 = c(4L, 4L, 4L, 4L, 3L, 3L, 3L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L), MonthCount = c(1L, 2L, 3L, 4L, 3L,
4L, 6L, 4L, 5L, 7L, 9L, 10L, 11L, 12L, 13L)), class = "data.frame",
row.names = c(NA,
-15L))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.