简体   繁体   中英

Calculating percentage change of panel data for other entities

I have a very large data frame that takes the form of panel data. The data has economic information on production for each industry within countries for a range of years. I would like to find a code that calculates year-to-year percentage changes for this output within the same industry but aggregates this for different countries as the one of the same row.

It sounds difficult (difficult to explain) so I give an example. Using this code:

panel <- cbind.data.frame(industry =  rep(c("Logging" , "Automobile") , each = 9) ,
               country = rep(c("Austria" , "Belgium" , "Croatia") , each = 3 , times = 2) ,
               year = rep(c(2000:2002) , times = 6) ,
               output = c(2,3,4,1,5,8,1,2,4,2,3,4,6,7,8,9,10,11))

That gives this matrix:

     industry country year output
1     Logging Austria 2000      2
2     Logging Austria 2001      3
3     Logging Austria 2002      4
4     Logging Belgium 2000      1
5     Logging Belgium 2001      5
6     Logging Belgium 2002      8
7     Logging Croatia 2000      1
8     Logging Croatia 2001      2
9     Logging Croatia 2002      4
10 Automobile Austria 2000      2
11 Automobile Austria 2001      3
12 Automobile Austria 2002      4
13 Automobile Belgium 2000      6
14 Automobile Belgium 2001      7
15 Automobile Belgium 2002      8
16 Automobile Croatia 2000      9
17 Automobile Croatia 2001     10
18 Automobile Croatia 2002     11

I compute percentage changes per industry using tidyverse:

library(tidyverse)

panel <- panel %>%
  group_by(country , industry) %>%
  mutate(per_change = (output - lag(output)) / lag(output))

giving:

# A tibble: 18 x 5
# Groups:   country, industry [6]
   industry   country  year output per_change
   <fct>      <fct>   <int>  <dbl>      <dbl>
 1 Logging    Austria  2000      2     NA    
 2 Logging    Austria  2001      3      0.5  
 3 Logging    Austria  2002      4      0.333
 4 Logging    Belgium  2000      1     NA    
 5 Logging    Belgium  2001      5      4    
 6 Logging    Belgium  2002      8      0.6  
 7 Logging    Croatia  2000      1     NA    
 8 Logging    Croatia  2001      2      1    
 9 Logging    Croatia  2002      4      1    
10 Automobile Austria  2000      2     NA    
11 Automobile Austria  2001      3      0.5  
12 Automobile Austria  2002      4      0.333
13 Automobile Belgium  2000      6     NA    
14 Automobile Belgium  2001      7      0.167
15 Automobile Belgium  2002      8      0.143
16 Automobile Croatia  2000      9     NA    
17 Automobile Croatia  2001     10      0.111
18 Automobile Croatia  2002     11      0.1  

So I would like a code that gives for row 1 NA, row 2 the sum of percentage change for all logging industry in 2001 except Austria (4+1) = 5, row 3 sum of all percentage change in logging industry in 2002 except Austria (0.6 +1) = 1.6, row 4 again NA, row 5 sum of percentage change for logging in 2001 except Belgium (1.5), ....

I wouldn't know how to do this other by hand.

Please also a code that is flexible and would be able to identify N countries and Y industries.

You can

  • first group the "panel" table according to industry and year to sum "per_change"
  • second join this grouped table with your main table
  • lastly subtract "per_change" from "grouped sum"

After your code:

d1<-as.data.frame(panel)

attach(panel)

d2<-aggregate(per_change~industry+year, FUN=sum)

detach(panel)

library(dplyr)
panel<-left_join(d1,d2, by=c("industry"="industry", "year"="year"))

panel$exc_per_change<-panel$per_change.y-panel$per_change.x

output is

> head(panel)
  industry country year output per_change.x per_change.y exc_per_change
1  Logging Austria 2000      2           NA           NA             NA
2  Logging Austria 2001      3    0.5000000     5.500000       5.000000
3  Logging Austria 2002      4    0.3333333     1.933333       1.600000
4  Logging Belgium 2000      1           NA           NA             NA
5  Logging Belgium 2001      5    4.0000000     5.500000       1.500000
6  Logging Belgium 2002      8    0.6000000     1.933333       1.333333

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM