I'm trying to calculate the cumulative sum of a column in my data frame, but I only want it to be done based on a condition.
I am very new to R and trying to convert this spss code below in R to achieve the dataframe below with a new column called cumulative based on whether the lca is the same as the row above and if it is then add health net cost to the cumulative cost above.
spss code:
IF LAG(LCA) NE LCA cumulative=health_net_cost.
IF LAG(LCA)=LCA cumulative=LAG(cumulative)+health_net_cost.
EXECUTE.
Dataframe in R
Lca health_net_cost cummulative
10 100 100
10 200 300
10 300 600
28 400 1000
28 100 1100
8 100 1200
8 200 1400
8 300 1700
This may the solution you looking for your problem. Using cumsum function.
df <- data.frame("LCA" = c(10,10,10,28,28,8,8,8),"Health_Net_Cost" = c(100,200,300,400,100,100,200,300))
df
Output:
LCA Health_Net_Cost
10 100
10 200
10 300
28 400
28 100
8 100
8 200
8 300
Run:
cum_df <- df%>%group_by(LCA)%>%mutate(Cumulative=cumsum(Health_Net_Cost))
cum_df
Yours expected output:
LCA Health_Net_Cost Cumulative
10 100 100
10 200 300
10 300 600
28 400 400
28 100 500
8 100 100
8 200 300
8 300 600
EDIT :
If you want the cumulative sums grouped by LCA
, this may help:
install.packages("dplyr")
library(dplyr)
df %>%
group_by(LCA) %>%
mutate("cumulative" = cumsum(Health_Net_Cost))
# A tibble: 8 x 3
# Groups: LCA [3]
LCA Health_Net_Cost cumulative
<dbl> <dbl> <dbl>
1 10 100 100
2 10 200 300
3 10 300 600
4 28 400 400
5 28 100 500
6 8 100 100
7 8 200 300
8 8 300 600
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.