简体   繁体   中英

Replicating Excel cell formulae in R

I have little experience with R. I am not sure how to do the following calculations in R, should I imitate Excel or there is a better way to do the simple Excel cell subtraction.

I have the following data in R.

year    marketplace bridged_on  value
01/01/2018  US  A    1,710,103,328 
01/01/2018  US  B    1,710,103,328 
01/01/2018  US  C    1,710,103,328 
01/01/2018  US  D    1,710,103,328 
01/01/2019  US  A    1,669,210,438 
01/01/2019  US  B    1,653,940,292 
01/01/2019  US  C    1,624,487,359 
01/01/2019  US  D    1,617,335,174 
01/01/2020  US  A    1,674,636,402 
01/01/2020  US  B    1,647,437,876 
01/01/2020  US  C    1,601,234,000 
01/01/2020  US  D    1,591,107,584 

I need to calculate change year-over-year and in Excel, I am creating a pivot table that has years as columns and then applying a subtraction formula across cells.

This is a screenshot from calculations done in Excel. I am calculating the difference between A and B, B and C, C and D and then subtracting the same difference from the previous year. For example, the calculation in H6 is (C6-C7)-(D6-D7).

I am not sure how to reproduce the same calculation in R and have G5 to H8 as an output in R.

Excel屏幕截图

Prepare and clean data

library(dplyr)
library(stringr)
library(purrr)
library(lubridate)
library(readr)
library(reshape2)

data <- read_delim("year    marketplace bridged_on  value
01/01/2018  US  A    1,710,103,328 
01/01/2018  US  B    1,710,103,328 
01/01/2018  US  C    1,710,103,328 
01/01/2018  US  D    1,710,103,328 
01/01/2019  US  A    1,669,210,438 
01/01/2019  US  B    1,653,940,292 
01/01/2019  US  C    1,624,487,359 
01/01/2019  US  D    1,617,335,174 
01/01/2020  US  A    1,674,636,402 
01/01/2020  US  B    1,647,437,876 
01/01/2020  US  C    1,601,234,000 
01/01/2020  US  D    1,591,107,584 ",delim = " ")

colnames(data) <- str_trim(colnames(data))
data <- map_dfc(data,str_trim)

data <- data %>%
    mutate(year= mdy(year),
           value = parse_number(value))

#display cleaned data

> data 
# A tibble: 12 x 4
   year       marketplace bridged_on      value
   <date>     <chr>       <chr>           <dbl>
 1 2018-01-01 US          A          1710103328
 2 2018-01-01 US          B          1710103328
 3 2018-01-01 US          C          1710103328
 4 2018-01-01 US          D          1710103328
 5 2019-01-01 US          A          1669210438
 6 2019-01-01 US          B          1653940292
 7 2019-01-01 US          C          1624487359
 8 2019-01-01 US          D          1617335174
 9 2020-01-01 US          A          1674636402
10 2020-01-01 US          B          1647437876
11 2020-01-01 US          C          1601234000
12 2020-01-01 US          D          1591107584

TO answer your question

I believe your calculation on row 8 is wrong though. You're calculating using the grand total according to the formula you provides.

To do it in R, you need to structure the data frame in long format and use dplyr::lag() to calculate difference between different years. Finally, you need to use reshape2::dcast() to convert from long format to wide format.

You can break down the pipes and see what's the intermediate result in each step.

result <- data %>%
    mutate(year = year(year)) %>%
    group_by(bridged_on) %>%
    mutate(annual_diff = value - lag(value)) %>%
    ungroup() %>%
    dplyr::filter(!is.na(annual_diff)) %>%
    group_by(year) %>%
    mutate(annual_diff2 = annual_diff - lag(annual_diff)) %>%
    dplyr::filter(!is.na(annual_diff2)) %>%
    select(year,bridged_on,annual_diff2) %>%
    ungroup() %>%
    dcast(bridged_on ~ year)

>result
  bridged_on      2019      2020
1          B -15270146 -11928380
2          C -29452933 -16750943
3          D  -7152185  -2974231

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM