I have little experience with R. I am not sure how to do the following calculations in R, should I imitate Excel or there is a better way to do the simple Excel cell subtraction.
I have the following data in R.
year marketplace bridged_on value
01/01/2018 US A 1,710,103,328
01/01/2018 US B 1,710,103,328
01/01/2018 US C 1,710,103,328
01/01/2018 US D 1,710,103,328
01/01/2019 US A 1,669,210,438
01/01/2019 US B 1,653,940,292
01/01/2019 US C 1,624,487,359
01/01/2019 US D 1,617,335,174
01/01/2020 US A 1,674,636,402
01/01/2020 US B 1,647,437,876
01/01/2020 US C 1,601,234,000
01/01/2020 US D 1,591,107,584
I need to calculate change year-over-year and in Excel, I am creating a pivot table that has years as columns and then applying a subtraction formula across cells.
This is a screenshot from calculations done in Excel. I am calculating the difference between A and B, B and C, C and D and then subtracting the same difference from the previous year. For example, the calculation in H6 is (C6-C7)-(D6-D7).
I am not sure how to reproduce the same calculation in R and have G5 to H8 as an output in R.
library(dplyr)
library(stringr)
library(purrr)
library(lubridate)
library(readr)
library(reshape2)
data <- read_delim("year marketplace bridged_on value
01/01/2018 US A 1,710,103,328
01/01/2018 US B 1,710,103,328
01/01/2018 US C 1,710,103,328
01/01/2018 US D 1,710,103,328
01/01/2019 US A 1,669,210,438
01/01/2019 US B 1,653,940,292
01/01/2019 US C 1,624,487,359
01/01/2019 US D 1,617,335,174
01/01/2020 US A 1,674,636,402
01/01/2020 US B 1,647,437,876
01/01/2020 US C 1,601,234,000
01/01/2020 US D 1,591,107,584 ",delim = " ")
colnames(data) <- str_trim(colnames(data))
data <- map_dfc(data,str_trim)
data <- data %>%
mutate(year= mdy(year),
value = parse_number(value))
#display cleaned data
> data
# A tibble: 12 x 4
year marketplace bridged_on value
<date> <chr> <chr> <dbl>
1 2018-01-01 US A 1710103328
2 2018-01-01 US B 1710103328
3 2018-01-01 US C 1710103328
4 2018-01-01 US D 1710103328
5 2019-01-01 US A 1669210438
6 2019-01-01 US B 1653940292
7 2019-01-01 US C 1624487359
8 2019-01-01 US D 1617335174
9 2020-01-01 US A 1674636402
10 2020-01-01 US B 1647437876
11 2020-01-01 US C 1601234000
12 2020-01-01 US D 1591107584
I believe your calculation on row 8 is wrong though. You're calculating using the grand total according to the formula you provides.
To do it in R, you need to structure the data frame in long format and use dplyr::lag()
to calculate difference between different years. Finally, you need to use reshape2::dcast()
to convert from long format to wide format.
You can break down the pipes and see what's the intermediate result in each step.
result <- data %>%
mutate(year = year(year)) %>%
group_by(bridged_on) %>%
mutate(annual_diff = value - lag(value)) %>%
ungroup() %>%
dplyr::filter(!is.na(annual_diff)) %>%
group_by(year) %>%
mutate(annual_diff2 = annual_diff - lag(annual_diff)) %>%
dplyr::filter(!is.na(annual_diff2)) %>%
select(year,bridged_on,annual_diff2) %>%
ungroup() %>%
dcast(bridged_on ~ year)
>result
bridged_on 2019 2020
1 B -15270146 -11928380
2 C -29452933 -16750943
3 D -7152185 -2974231
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.