![](/img/trans.png)
[英]dplyr count, sum and calculate the percentage and round to whole number using R
[英]Calculate Percentage Change in R using dplyr
我想計算的百分比Profit
由YEAR
這是一個相當簡單的任務,但不知何故,我得到NA
。 我檢查了之前問過的相同問題,但是我不明白為什么我會得到NA
。 數據如下:
> df_vertical_growth
YEAR VERTICAL Profit pct_change
1 2017 AGRICULTURE 0 NA
2 2016 AGRICULTURE 2053358 NA
3 2015 AGRICULTURE 0 NA
4 2014 AGRICULTURE 2370747 NA
5 2013 AGRICULTURE 4066693 NA
6 2017 COMMUNICATION 0 NA
7 2016 COMMUNICATION 1680074 NA
8 2015 COMMUNICATION 1322470 NA
9 2014 COMMUNICATION 1460133 NA
10 2013 COMMUNICATION 1529863 NA
11 2017 CONSTRUCTION 0 NA
12 2016 CONSTRUCTION 0 NA
13 2015 CONSTRUCTION 0 NA
14 2014 CONSTRUCTION 8250149 NA
15 2013 CONSTRUCTION 0 NA
16 2017 EDUCATION 0 NA
17 2016 EDUCATION 12497015 NA
18 2015 EDUCATION 13437356 NA
19 2014 EDUCATION 10856685 NA
20 2013 EDUCATION 13881127 NA
21 2017 FINANCE, INSURANCE, REAL ESTATE 0 NA
22 2016 FINANCE, INSURANCE, REAL ESTATE 0 NA
23 2015 FINANCE, INSURANCE, REAL ESTATE 0 NA
24 2014 FINANCE, INSURANCE, REAL ESTATE 0 NA
25 2013 FINANCE, INSURANCE, REAL ESTATE 5008436 NA
26 2017 HEALTHCARE 0 NA
27 2016 HEALTHCARE 0 NA
28 2015 HEALTHCARE 0 NA
29 2014 HEALTHCARE 4554364 NA
30 2013 HEALTHCARE 5078130 NA
31 2017 HOSPITALITY 0 NA
32 2016 HOSPITALITY 4445512 NA
33 2015 HOSPITALITY 5499419 NA
34 2014 HOSPITALITY 9060639 NA
35 2013 HOSPITALITY 4391522 NA
36 2017 MANUFACTURING 0 NA
37 2016 MANUFACTURING 0 NA
38 2015 MANUFACTURING 0 NA
39 2014 MANUFACTURING 0 NA
40 2013 MANUFACTURING 27466974 NA
41 2017 MINING 0 NA
42 2016 MINING 4359251 NA
43 2015 MINING 4163201 NA
44 2014 MINING 6272530 NA
45 2013 MINING 6668191 NA
46 2017 OTHER 0 NA
47 2016 OTHER 0 NA
48 2015 OTHER 0 NA
49 2014 OTHER 5935199 NA
50 2013 OTHER 3585969 NA
51 2017 PUBLIC ADMIN 0 NA
52 2016 PUBLIC ADMIN 0 NA
53 2015 PUBLIC ADMIN 0 NA
54 2014 PUBLIC ADMIN 0 NA
55 2013 PUBLIC ADMIN 0 NA
56 2017 RETAIL TRADE 0 NA
57 2016 RETAIL TRADE 0 NA
58 2015 RETAIL TRADE 0 NA
59 2014 RETAIL TRADE 0 NA
60 2013 RETAIL TRADE 0 NA
61 2017 SERVICE 0 NA
62 2016 SERVICE 0 NA
63 2015 SERVICE 0 NA
64 2014 SERVICE 0 NA
65 2013 SERVICE 28018522 NA
66 2017 TRANSPORTATION 0 NA
67 2016 TRANSPORTATION 0 NA
68 2015 TRANSPORTATION 0 NA
69 2014 TRANSPORTATION 0 NA
70 2013 TRANSPORTATION 8430244 NA
71 2017 UTILITY 0 NA
72 2016 UTILITY 3551989 NA
73 2015 UTILITY 6535248 NA
74 2014 UTILITY 3995486 NA
75 2013 UTILITY 4477617 NA
76 2017 WHOLESALE TRADE 0 NA
77 2016 WHOLESALE TRADE 6898041 NA
78 2015 WHOLESALE TRADE 7120828 NA
79 2014 WHOLESALE TRADE 0 NA
80 2013 WHOLESALE TRADE 0 NA
我的代碼:
df_vertical_growth %>% group_by(YEAR, VERTICAL) %>%
mutate(pct_change = ((Profit/lag(Profit) - 1) * 100))
現在,基於此處提供的答案, 如何為R中的多個列計算組內的百分比變化? ,還嘗試執行以下操作:
pct <- function(x) {x / lag(x) - 1}
df_vertical_growth %>% group_by(YEAR, VERTICAL) %>% mutate_at(funs=pct,Profit)
但是我收到以下錯誤:
check_dot_cols(.vars,.cols)中的錯誤:找不到對象'Profit'
有人可以告訴我,我做錯了什么嗎? 非常感謝。
問題在於每個小組都有一個觀察結果。 每個垂直行業一年。 一個觀察的滯后是什么? 此外,由於年份以降序排列,我相信您需要領導。
library(tidyverse)
z %>%
group_by(VERTICAL) %>%
mutate(pct_change = (Profit/lead(Profit) - 1) * 100)
#output
YEAR VERTICAL Profit pct_change
<int> <fctr> <int> <dbl>
1 2017 AGRICULTURE 0 -100
2 2016 AGRICULTURE 2053358 Inf
3 2015 AGRICULTURE 0 -100
4 2014 AGRICULTURE 2370747 - 41.7
5 2013 AGRICULTURE 4066693 NA
6 2017 COMMUNICATION 0 -100
7 2016 COMMUNICATION 1680074 27.0
8 2015 COMMUNICATION 1322470 - 9.43
9 2014 COMMUNICATION 1460133 - 4.56
10 2013 COMMUNICATION 1529863 NA
該解決方案假定年份以正確的順序排列,以確保:
z %>%
group_by(VERTICAL) %>%
arrange(YEAR, .by_group = TRUE) %>%
mutate(pct_change = (Profit/lag(Profit) - 1) * 100)
#output
YEAR VERTICAL Profit pct_change
<int> <fctr> <int> <dbl>
1 2013 AGRICULTURE 4066693 NA
2 2014 AGRICULTURE 2370747 - 41.7
3 2015 AGRICULTURE 0 -100
4 2016 AGRICULTURE 2053358 Inf
5 2017 AGRICULTURE 0 -100
6 2013 COMMUNICATION 1529863 NA
7 2014 COMMUNICATION 1460133 - 4.56
8 2015 COMMUNICATION 1322470 - 9.43
9 2016 COMMUNICATION 1680074 27.0
10 2017 COMMUNICATION 0 -100
或使用
arrange(desc(YEAR), .by_group = TRUE)
和lead
z是:
structure(list(YEAR = c(2017L, 2016L, 2015L, 2014L, 2013L, 2017L,
2016L, 2015L, 2014L, 2013L, 2017L, 2016L, 2015L, 2014L, 2013L,
2017L, 2016L, 2015L, 2014L, 2013L, 2017L, 2016L, 2015L, 2014L,
2013L, 2017L, 2016L, 2015L, 2014L, 2013L, 2017L, 2016L, 2015L,
2014L, 2013L, 2017L, 2016L, 2015L, 2014L, 2013L, 2017L, 2016L,
2015L, 2014L, 2013L, 2017L, 2016L, 2015L, 2014L, 2013L, 2017L,
2016L, 2015L, 2014L, 2013L, 2017L, 2016L, 2015L, 2014L, 2013L
), VERTICAL = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L,
6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 9L,
9L, 9L, 9L, 9L, 10L, 10L, 10L, 10L, 10L, 11L, 11L, 11L, 11L,
11L, 12L, 12L, 12L, 12L, 12L), .Label = c("AGRICULTURE", "COMMUNICATION",
"CONSTRUCTION", "EDUCATION", "HEALTHCARE", "HOSPITALITY", "MANUFACTURING",
"MINING", "OTHER", "SERVICE", "TRANSPORTATION", "UTILITY"), class = "factor"),
Profit = c(0L, 2053358L, 0L, 2370747L, 4066693L, 0L, 1680074L,
1322470L, 1460133L, 1529863L, 0L, 0L, 0L, 8250149L, 0L, 0L,
12497015L, 13437356L, 10856685L, 13881127L, 0L, 0L, 0L, 4554364L,
5078130L, 0L, 4445512L, 5499419L, 9060639L, 4391522L, 0L,
0L, 0L, 0L, 27466974L, 0L, 4359251L, 4163201L, 6272530L,
6668191L, 0L, 0L, 0L, 5935199L, 3585969L, 0L, 0L, 0L, 0L,
28018522L, 0L, 0L, 0L, 0L, 8430244L, 0L, 3551989L, 6535248L,
3995486L, 4477617L)), .Names = c("YEAR", "VERTICAL", "Profit"
), row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9",
"10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "20",
"26", "27", "28", "29", "30", "31", "32", "33", "34", "35", "36",
"37", "38", "39", "40", "41", "42", "43", "44", "45", "46", "47",
"48", "49", "50", "61", "62", "63", "64", "65", "66", "67", "68",
"69", "70", "71", "72", "73", "74", "75"), class = "data.frame")
假設您的“ Profit
列代表給定年份的利潤,此函數將計算n年與n-1年之間的差,除以n-1年的值,然后乘以100得到一個百分比。 如果n-1年的值為零,則沒有有效的百分比變化。 重要的是,僅按“ VERTICAL
而非“ YEAR
對數據進行分組。
profit_pct_change <- function(x) {
x <- x[order(x$YEAR, decreasing = TRUE), ] # Confirms ordered by decreasing year
pct_change <- -diff(x$Profit)/x$Profit[-1] * 100 # Gets percent change in profit from preceding year
data.frame(year = x$YEAR[-length(x$YEAR)], pct_change = pct_change) # Returns data frame
}
df_vertical_growth %>%
group_by(VERTICAL) %>%
do(profit_pct_change(.))
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.