[英]How to calculate a value row that is based on the previous value in the same column (in R)?
For a time series analysis, I have a data frame that is based on periods.对于时间序列分析,我有一个基于周期的数据框。 I have a column with starters and a column with patients who discontinue therapy.我有一个关于初学者的专栏和一个关于停止治疗的患者的专栏。 Now I want to calculate the current users of therapy per period.现在我想计算每个时期的当前治疗用户。 I calculated the difference between 'starters' and 'discontinue' as an intermediate step.我计算了“starters”和“discontinue”之间的差异作为中间步骤。
> df <- data.frame(
period = c(1,2,3,4,5,6),
starters = c(595, 113, 36, 489, 28, 101),
discontinue = c(0, 11, 6, 8, 14, 8))
> df$difference <- df$starters - df$discontinue
> df
period starters discontinue difference
1 1 595 0 595
2 2 113 11 102
3 3 36 6 30
4 4 489 8 481
5 5 28 14 14
6 6 101 8 93
To calculate current users, I would like to add a column 'current.users' with initial value = df$difference[1]
and adds the difference each row.要计算当前用户,我想添加一个初始值 = df$difference[1]
的列“current.users”,并添加每行的差异。 Output should look like: Output 应如下所示:
> df
period starters discontinue difference current.users
1 1 595 0 595 595
2 2 113 11 102 697
3 3 36 6 30 727
4 4 489 8 481 1208
5 5 28 14 14 1222
6 6 101 8 93 1315
I tried to use for loops and data.table, but I can not calculate the next value in the column based on the previous value in the same column.我尝试使用for循环和data.table,但我无法根据同一列中的前一个值计算该列中的下一个值。 Does anyone know the correct code for this issue?有谁知道这个问题的正确代码? Thanks in advance!提前致谢!
You can do a simple cumsum
like this:你可以像这样做一个简单的cumsum
:
df <- data.frame(
period = c(1,2,3,4,5,6),
starters = c(595, 113, 36, 489, 28, 101),
discontinue = c(0, 11, 6, 8, 14, 8))
df$difference <- df$starters - df$discontinue
# cumsum
df$current.users <- cumsum(df$difference)
df
#> period starters discontinue difference current.users
#> 1 1 595 0 595 595
#> 2 2 113 11 102 697
#> 3 3 36 6 30 727
#> 4 4 489 8 481 1208
#> 5 5 28 14 14 1222
#> 6 6 101 8 93 1315
Created on 2022-08-26 with reprex v2.0.2使用reprex v2.0.2创建于 2022-08-26
With dplyr
带dplyr
library(tidyverse)
df %>%
mutate(difference = starters - discontinue,
current.users = cumsum(difference))
period starters discontinue difference current.users
1 1 595 0 595 595
2 2 113 11 102 697
3 3 36 6 30 727
4 4 489 8 481 1208
5 5 28 14 14 1222
6 6 101 8 93 1315
Without the difference column无差异列
df %>%
mutate(current.users = cumsum(starters - discontinue))
period starters discontinue current.users
1 1 595 0 595
2 2 113 11 697
3 3 36 6 727
4 4 489 8 1208
5 5 28 14 1222
6 6 101 8 1315
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.