简体   繁体   English

为订阅之间的间隙添加空行

[英]Add empty rows for gaps between subscriptions

I have been struggling with this for a while now and I haven't been able to find a comparable question asked anywhere, hence my first question on here!我已经为此苦苦挣扎了一段时间,但在任何地方都找不到类似的问题,因此我在这里提出了第一个问题!

I'm fairly new to R so please excuse any obvious errors I have made.我对 R 相当陌生,所以请原谅我犯的任何明显错误。

I have a dataset which has a row for each subscription that a user has or has had.我有一个数据集,对于用户拥有或曾经拥有的每个订阅都有一行。 Some users have multiple rows, while some others only have one.一些用户有多行,而另一些用户只有一行。 Only active or previously active subscriptions are present.仅存在活动或以前活动的订阅。

I have two variables which state when the subscription has started and when it ended called, Begindate and Enddate respectively.我有两个变量分别说明订阅何时开始和何时结束调用,分别是 Begindate 和 Enddate。 I already have relationlength variables created which state the amount of days between these two variables for each type of subscription.我已经创建了relationlength 变量,用于说明每种订阅类型的这两个变量之间的天数。 This means that the relationlength variables only give the amount of days for when a subscription was active.这意味着relationlength 变量只给出订阅处于活动状态的天数。

What I would like to do is create empty rows in between the different subscription rows for the time periods in which no subscription was active, starting from the earliest Begindate known for the specific user and ending on a given date where all subscriptions end (20-04-2022).我想要做的是在没有订阅处于活动状态的时间段的不同订阅行之间创建空行,从特定用户已知的最早开始日期开始,到所有订阅结束的给定日期(20- 04-2022)。

I have tried to compare the date difference from the first begindate known for a user and the final date and subtracting the relation length known for the other subscription types.我试图比较与用户已知的第一个开始日期和最终日期的日期差异,并减去其他订阅类型已知的关系长度。 However, I could not make this work.但是,我无法完成这项工作。

An example of what the df currently looks like: df 当前外观的示例:

(rl standing for relationlength) (rl 代表关系长度)

ID Begindate Enddate Subscrtype active rl_fixed rl_promotional Productgroup

1 2019-08-26 2022-04-20 fixed   1      968      0              1
1 2018-08-24 2019-08-23 fixed   0      364      0              1
1 2015-08-24 2016-08-23 promo   0      0        364            2
2 2019-08-26 2019-09-12 fixed   0      17       0              1
2 2018-08-24 2019-08-23 fixed   0      364      0              1

What I would like it to look like:我希望它看起来像什么:

ID Begindate Enddate Subscrtype active rl_fixed rl_promo rl_none Productgroup

1 2019-08-26 2022-04-20 fixed   1      968      0        0       1
1 2019-08-24 2019-08-25 none    0      0        0        2       NA
1 2018-08-24 2019-08-23 fixed   0      364      0        0       1
1 2016-08-24 2018-08-23 none    0      0        0        729     NA
1 2015-08-24 2016-08-23 promo   0      0        364      0       2
2 2019-09-13 2022-04-20 none    0      0        0        950     NA
2 2019-08-26 2019-09-12 fixed   0      17       0        0       1
2 2019-08-24 2019-08-25 none    0      0        0        2       NA
2 2018-08-24 2019-08-23 fixed   0      364      0        0       1

The end goal is to aggregate and have a clear overview of the specific relation lengths for the different types of relations possible for a user.最终目标是汇总并清楚地了解用户可能存在的不同类型关系的特定关系长度。

Thank you in advance!先感谢您!

dput for one specific user in the real df:实际 df 中一个特定用户的 dput:

structure(list(ï..CRM.relatienummer = structure(c(1L, 1L, 1L, 
1L, 1L, 1L), .Label = "1", class = "factor"), Begindatum = c("2019-08-26", 
"2018-08-24", "2017-08-24", "2016-08-24", "2015-08-20", "2016-06-01"
), Einddatum = c("2022-04-20", "2019-08-23", "2018-08-23", "2017-08-23", 
"2016-05-31", "2016-08-19"), Type.abonnement = structure(c(1L, 
1L, 1L, 1L, 1L, 1L), .Label = "Actie", class = "factor"), Status_dummy = c(1, 
0, 0, 0, 0, 0), relationlength_fixed = c(0, 0, 0, 0, 0, 0), relationlength_promo = c(968, 
364, 364, 364, 285, 79), relationlength_trial = c(0, 0, 0, 0, 
0, 0), fixed_dummy = c(0, 0, 0, 0, 0, 0), trial_dummy = c(0, 
0, 0, 0, 0, 0), promotional_dummy = c(1, 1, 1, 1, 1, 1)), row.names = c("1:20610", 
"2:38646", "2:39231", "2:39232", "2:39248", "2:39837"), class = "data.frame")

Edit:编辑:

I have tried to run this code:我试图运行此代码:

dfs <- split(testdata,testdata$ï..CRM.relatienummer)

r <- lapply(seq(length(dfs)), function(k){
  v <- dfs[[k]]
  vt <- data.frame(unique(v$ï..CRM.relatienummer), 
                   as.character((as.Date(v$Einddatum)+1)[-1]), 
                   as.character((as.Date(v$Begindatum)-1)[-nrow(v)]), 
                   0,
                   0,
                   0,
                   0,
                   (as.Date(v$Begindatum)-1)[-nrow(v)] - (as.Date(v$Einddatum)+1)[-1],
                   NA,
                   0,
                   0,
                   0,
                   0,
                   0)
  colnames(vt) <- c(colnames(v)[-ncol(v)],"rl_none",colnames(v)[ncol(v)])
  (testdata <- rbind(data.frame(v[-ncol(v)],rl_none = 0,v[ncol(v)]),vt))[order(as.Date(testdata$Begindatum),decreasing = T),]
})

res <- data.frame(Reduce(rbind,r),row.names = NULL)

On this dataframe, with no luck unfortunately:在这个数据帧上,不幸的是没有运气:

structure(list(ï..CRM.relatienummer = structure(c("d45248b8974dc4f8ff948779e0fd07e20f304e929ada4e14c0420aebed81e9b5", 
"2ab04e80b3e64601147df977d6054c04ffa80014b3691b25dd1cc8ef85cea06a", 
"2ab04e80b3e64601147df977d6054c04ffa80014b3691b25dd1cc8ef85cea06a", 
"bcf2c99e6dc974380f967204b9623dce2c8a3fad694dc0b4430fcbf77f8f39f3", 
"bcf2c99e6dc974380f967204b9623dce2c8a3fad694dc0b4430fcbf77f8f39f3", 
"f8610cd0237858ac9384d6ba209759ae306860ffabb3f8e6c3d6fc68dbaddc51", 
"e5b8b3f46165e48aec8bbe65ed1cb29d18a0492fbcac44803372f672348459db", 
"c737815b2365b01a8a85c380364a0f721685a131de98cd7790b4d40bb8c4e05b", 
"b9c0272caa8d5d3497d28cce3bda5d3d17c22f18c5f65c5e82c572b410a8ea71", 
"b9c0272caa8d5d3497d28cce3bda5d3d17c22f18c5f65c5e82c572b410a8ea71", 
"539c6c3e604245008daefbe500ff29357bee91f82a7896126bd0f69848524cb7", 
"d361338bed51cb9c8aa73fd8914cbf392f4e05e7b073f637f7b150cf02b89c8c", 
"505d3df3f1298e07aa96073490b72acd2391da06ad4cfbd5a9fbde3a3de79684", 
"826443481cbb5b4e061040d443a0ce8d94322615d8ffae1e68b2ff7d896afcf7", 
"2b59a1ec028c261c0f22cd6a49220dc7cec9a9fb0fabe2296b4ba77a60cfdaae"
), class = c("hash", "sha256")), Begindatum = c("2019-06-14", 
"2019-03-01", "2019-09-02", "2019-03-03", "2019-04-01", "2019-09-21", 
"2019-02-02", "2019-06-11", "2019-02-05", "2019-02-09", "2019-07-24", 
"2019-05-08", "2019-09-27", "2019-08-03", "2019-04-03"), Einddatum = c("2022-04-20", 
"2019-09-01", "2022-04-20", "2019-03-31", "2022-04-20", "2022-04-20", 
"2019-02-14", "2019-07-08", "2019-02-11", "2020-02-08", "2019-09-03", 
"2019-06-18", "2019-11-07", "2019-08-16", "2022-04-20"), Status_dummy = c(1, 
0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1), relationlength_fixed = c(0, 
184, 961, 28, 0, 0, 0, 0, 6, 0, 0, 0, 0, 0, 0), relationlength_promo = c(1041, 
0, 0, 0, 1115, 942, 12, 0, 0, 364, 0, 0, 0, 0, 1113), relationlength_trial = c(0, 
0, 0, 0, 0, 0, 0, 27, 0, 0, 41, 41, 41, 13, 0), rl_none = c(NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), fixed_dummy = c(0, 
1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0), trial_dummy = c(0, 
0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0), promotional_dummy = c(1, 
0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1), active_subscr_dummy = c(3, 
0, 5, 0, 3, 3, 0, 0, 0, 3, 0, 0, 1, 0, 3), hashedEmail = c(NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA)), row.names = c("1:1", 
"1:2", "1:3", "1:4", "1:5", "1:6", "1:7", "1:8", "1:9", "1:10", 
"1:11", "1:12", "1:13", "1:14", "1:15"), class = "data.frame")

Hopefully this is what you are expecting希望这是你所期待的

dfs <- split(df,df$ID)

r <- lapply(seq(length(dfs)), function(k){
  v <- dfs[[k]]
  vt <- data.frame(unique(v$ID), 
                   as.character((as.Date(v$Enddate)+1)[-1]), 
                   as.character((as.Date(v$Begindate)-1)[-nrow(v)]), 
                   "none",
                   0,
                   0,
                   0,
                   (as.Date(v$Begindate)-1)[-nrow(v)] - (as.Date(v$Enddate)+1)[-1],
                   NA)
  colnames(vt) <- c(colnames(v)[-ncol(v)],"rl_none",colnames(v)[ncol(v)])
  (df <- rbind(data.frame(v[-ncol(v)],rl_none = 0,v[ncol(v)]),vt))[order(as.Date(df$Begindate),decreasing = T),]
})

res <- data.frame(Reduce(rbind,r),row.names = NULL)

which gives这使

> res
  ID  Begindate    Enddate Subscrtype active rl_fixed rl_promo rl_none Productgroup
1  1 2019-08-26 2022-04-20      fixed      1      968        0       0            1
2  1 2019-08-24 2019-08-25       none      0        0        0       1           NA
3  1 2018-08-24 2019-08-23      fixed      0      364        0       0            1
4  1 2016-08-24 2018-08-23       none      0        0        0     729           NA
5  1 2015-08-24 2016-08-23      promo      0        0      364       0            2
6  2 2019-08-26 2019-09-12      fixed      0       17        0       0            1
7  2 2019-08-24 2019-08-25       none      0        0        0       1           NA
8  2 2018-08-24 2019-08-23      fixed      0      364        0       0            1

DATA数据

structure(list(ID = c(1L, 1L, 1L, 2L, 2L), Begindate = structure(c(3L, 
2L, 1L, 3L, 2L), .Label = c("2015-08-24", "2018-08-24", "2019-08-26"
), class = "factor"), Enddate = structure(c(4L, 2L, 1L, 3L, 2L
), .Label = c("2016-08-23", "2019-08-23", "2019-09-12", "2022-04-20"
), class = "factor"), Subscrtype = structure(c(1L, 1L, 2L, 1L, 
1L), .Label = c("fixed", "promo"), class = "factor"), active = c(1L, 
0L, 0L, 0L, 0L), rl_fixed = c(968L, 364L, 0L, 17L, 364L), rl_promo = c(0L, 
0L, 364L, 0L, 0L), Productgroup = c(1L, 1L, 2L, 1L, 1L)), class = "data.frame", row.names = c(NA, 
-5L))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM