簡體   English   中英

使用dplyr計算R中的百分比變化

[英]Calculate Percentage Change in R using dplyr

我想計算的百分比ProfitYEAR這是一個相當簡單的任務,但不知何故,我得到NA 我檢查了之前問過的相同問題,但是我不明白為什么我會得到NA 數據如下:

> df_vertical_growth
   YEAR                        VERTICAL   Profit pct_change
1  2017                     AGRICULTURE        0         NA
2  2016                     AGRICULTURE  2053358         NA
3  2015                     AGRICULTURE        0         NA
4  2014                     AGRICULTURE  2370747         NA
5  2013                     AGRICULTURE  4066693         NA
6  2017                   COMMUNICATION        0         NA
7  2016                   COMMUNICATION  1680074         NA
8  2015                   COMMUNICATION  1322470         NA
9  2014                   COMMUNICATION  1460133         NA
10 2013                   COMMUNICATION  1529863         NA
11 2017                    CONSTRUCTION        0         NA
12 2016                    CONSTRUCTION        0         NA
13 2015                    CONSTRUCTION        0         NA
14 2014                    CONSTRUCTION  8250149         NA
15 2013                    CONSTRUCTION        0         NA
16 2017                       EDUCATION        0         NA
17 2016                       EDUCATION 12497015         NA
18 2015                       EDUCATION 13437356         NA
19 2014                       EDUCATION 10856685         NA
20 2013                       EDUCATION 13881127         NA
21 2017 FINANCE, INSURANCE, REAL ESTATE        0         NA
22 2016 FINANCE, INSURANCE, REAL ESTATE        0         NA
23 2015 FINANCE, INSURANCE, REAL ESTATE        0         NA
24 2014 FINANCE, INSURANCE, REAL ESTATE        0         NA
25 2013 FINANCE, INSURANCE, REAL ESTATE  5008436         NA
26 2017                      HEALTHCARE        0         NA
27 2016                      HEALTHCARE        0         NA
28 2015                      HEALTHCARE        0         NA
29 2014                      HEALTHCARE  4554364         NA
30 2013                      HEALTHCARE  5078130         NA
31 2017                     HOSPITALITY        0         NA
32 2016                     HOSPITALITY  4445512         NA
33 2015                     HOSPITALITY  5499419         NA
34 2014                     HOSPITALITY  9060639         NA
35 2013                     HOSPITALITY  4391522         NA
36 2017                   MANUFACTURING        0         NA
37 2016                   MANUFACTURING        0         NA
38 2015                   MANUFACTURING        0         NA
39 2014                   MANUFACTURING        0         NA
40 2013                   MANUFACTURING 27466974         NA
41 2017                          MINING        0         NA
42 2016                          MINING  4359251         NA
43 2015                          MINING  4163201         NA
44 2014                          MINING  6272530         NA
45 2013                          MINING  6668191         NA
46 2017                           OTHER        0         NA
47 2016                           OTHER        0         NA
48 2015                           OTHER        0         NA
49 2014                           OTHER  5935199         NA
50 2013                           OTHER  3585969         NA
51 2017                    PUBLIC ADMIN        0         NA
52 2016                    PUBLIC ADMIN        0         NA
53 2015                    PUBLIC ADMIN        0         NA
54 2014                    PUBLIC ADMIN        0         NA
55 2013                    PUBLIC ADMIN        0         NA
56 2017                    RETAIL TRADE        0         NA
57 2016                    RETAIL TRADE        0         NA
58 2015                    RETAIL TRADE        0         NA
59 2014                    RETAIL TRADE        0         NA
60 2013                    RETAIL TRADE        0         NA
61 2017                         SERVICE        0         NA
62 2016                         SERVICE        0         NA
63 2015                         SERVICE        0         NA
64 2014                         SERVICE        0         NA
65 2013                         SERVICE 28018522         NA
66 2017                  TRANSPORTATION        0         NA
67 2016                  TRANSPORTATION        0         NA
68 2015                  TRANSPORTATION        0         NA
69 2014                  TRANSPORTATION        0         NA
70 2013                  TRANSPORTATION  8430244         NA
71 2017                         UTILITY        0         NA
72 2016                         UTILITY  3551989         NA
73 2015                         UTILITY  6535248         NA
74 2014                         UTILITY  3995486         NA
75 2013                         UTILITY  4477617         NA
76 2017                 WHOLESALE TRADE        0         NA
77 2016                 WHOLESALE TRADE  6898041         NA
78 2015                 WHOLESALE TRADE  7120828         NA
79 2014                 WHOLESALE TRADE        0         NA
80 2013                 WHOLESALE TRADE        0         NA

我的代碼:

df_vertical_growth %>% group_by(YEAR, VERTICAL) %>% 
     mutate(pct_change = ((Profit/lag(Profit) - 1) * 100))

現在,基於此處提供的答案, 如何為R中的多個列計算組內的百分比變化? ,還嘗試執行以下操作:

pct <- function(x) {x / lag(x) - 1}
df_vertical_growth %>% group_by(YEAR, VERTICAL) %>% mutate_at(funs=pct,Profit)

但是我收到以下錯誤:

check_dot_cols(.vars,.cols)中的錯誤:找不到對象'Profit'

有人可以告訴我,我做錯了什么嗎? 非常感謝。

問題在於每個小組都有一個觀察結果。 每個垂直行業一年。 一個觀察的滯后是什么? 此外,由於年份以降序排列,我相信您需要領導。

library(tidyverse)
z %>%
  group_by(VERTICAL) %>% 
  mutate(pct_change = (Profit/lead(Profit) - 1) * 100)
#output
    YEAR VERTICAL       Profit pct_change
   <int> <fctr>          <int>      <dbl>
 1  2017 AGRICULTURE         0    -100   
 2  2016 AGRICULTURE   2053358     Inf   
 3  2015 AGRICULTURE         0    -100   
 4  2014 AGRICULTURE   2370747    - 41.7 
 5  2013 AGRICULTURE   4066693      NA   
 6  2017 COMMUNICATION       0    -100   
 7  2016 COMMUNICATION 1680074      27.0 
 8  2015 COMMUNICATION 1322470    -  9.43
 9  2014 COMMUNICATION 1460133    -  4.56
10  2013 COMMUNICATION 1529863      NA   

該解決方案假定年份以正確的順序排列,以確保:

z %>%
  group_by(VERTICAL) %>% 
  arrange(YEAR, .by_group = TRUE) %>%
  mutate(pct_change = (Profit/lag(Profit) - 1) * 100)
#output
    YEAR VERTICAL       Profit pct_change
   <int> <fctr>          <int>      <dbl>
 1  2013 AGRICULTURE   4066693      NA   
 2  2014 AGRICULTURE   2370747    - 41.7 
 3  2015 AGRICULTURE         0    -100   
 4  2016 AGRICULTURE   2053358     Inf   
 5  2017 AGRICULTURE         0    -100   
 6  2013 COMMUNICATION 1529863      NA   
 7  2014 COMMUNICATION 1460133    -  4.56
 8  2015 COMMUNICATION 1322470    -  9.43
 9  2016 COMMUNICATION 1680074      27.0 
10  2017 COMMUNICATION       0    -100   

或使用

arrange(desc(YEAR), .by_group = TRUE)

lead

z是:

structure(list(YEAR = c(2017L, 2016L, 2015L, 2014L, 2013L, 2017L, 
2016L, 2015L, 2014L, 2013L, 2017L, 2016L, 2015L, 2014L, 2013L, 
2017L, 2016L, 2015L, 2014L, 2013L, 2017L, 2016L, 2015L, 2014L, 
2013L, 2017L, 2016L, 2015L, 2014L, 2013L, 2017L, 2016L, 2015L, 
2014L, 2013L, 2017L, 2016L, 2015L, 2014L, 2013L, 2017L, 2016L, 
2015L, 2014L, 2013L, 2017L, 2016L, 2015L, 2014L, 2013L, 2017L, 
2016L, 2015L, 2014L, 2013L, 2017L, 2016L, 2015L, 2014L, 2013L
), VERTICAL = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 
6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 9L, 
9L, 9L, 9L, 9L, 10L, 10L, 10L, 10L, 10L, 11L, 11L, 11L, 11L, 
11L, 12L, 12L, 12L, 12L, 12L), .Label = c("AGRICULTURE", "COMMUNICATION", 
"CONSTRUCTION", "EDUCATION", "HEALTHCARE", "HOSPITALITY", "MANUFACTURING", 
"MINING", "OTHER", "SERVICE", "TRANSPORTATION", "UTILITY"), class = "factor"), 
    Profit = c(0L, 2053358L, 0L, 2370747L, 4066693L, 0L, 1680074L, 
    1322470L, 1460133L, 1529863L, 0L, 0L, 0L, 8250149L, 0L, 0L, 
    12497015L, 13437356L, 10856685L, 13881127L, 0L, 0L, 0L, 4554364L, 
    5078130L, 0L, 4445512L, 5499419L, 9060639L, 4391522L, 0L, 
    0L, 0L, 0L, 27466974L, 0L, 4359251L, 4163201L, 6272530L, 
    6668191L, 0L, 0L, 0L, 5935199L, 3585969L, 0L, 0L, 0L, 0L, 
    28018522L, 0L, 0L, 0L, 0L, 8430244L, 0L, 3551989L, 6535248L, 
    3995486L, 4477617L)), .Names = c("YEAR", "VERTICAL", "Profit"
), row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", 
"10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "20", 
"26", "27", "28", "29", "30", "31", "32", "33", "34", "35", "36", 
"37", "38", "39", "40", "41", "42", "43", "44", "45", "46", "47", 
"48", "49", "50", "61", "62", "63", "64", "65", "66", "67", "68", 
"69", "70", "71", "72", "73", "74", "75"), class = "data.frame")

假設您的“ Profit列代表給定年份的利潤,此函數將計算n年與n-1年之間的差,除以n-1年的值,然后乘以100得到一個百分比。 如果n-1年的值為零,則沒有有效的百分比變化。 重要的是,僅按“ VERTICAL而非“ YEAR對數據進行分組。

profit_pct_change <- function(x) {
  x <- x[order(x$YEAR, decreasing = TRUE), ] # Confirms ordered by decreasing year
  pct_change <- -diff(x$Profit)/x$Profit[-1] * 100 # Gets percent change in profit from preceding year
  data.frame(year = x$YEAR[-length(x$YEAR)], pct_change = pct_change) # Returns data frame
}

df_vertical_growth %>% 
  group_by(VERTICAL) %>%
  do(profit_pct_change(.))

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM