简体   繁体   中英

How to get a data set from OECD using OECD package in R?

I'm quite new to R and am now trying to get data from the OECD website using the "OECD" package. However, I cannot make it work.

The data I would want can be found here https://stats.oecd.org/Index.aspx?DataSetCode=REGION_ECONOM# . I want to extract GVA , GVA deflator , population and GVA by sector from these tables on the regional level. But every time I use below code, R just doesn't produce any output:

library(OECD)

df <- get_dataset("REGION_ECONOM") 

If any of you could help me I'd be beyond grateful :D

To address your question, two main issues rise. First, since you did not specif icy any country or region, acquiring the data takes a long time. Second, the OECD package is somehow case sensitive, regarding the sequence of filters.

Here is an example to obtain the data you requested, considering only one country (AUD) to simplify the answer.

library(OECD)

dataset_list <- get_datasets()
search_dataset("regional", data = dataset_list)
# A tibble: 14 x 2
   id                  title                                                          
   <chr>               <chr>                                                          
 1 REGION_DEMOGR       Regional Demography                                            
 2 REGION_ECONOM       Regional Economy                                               
 3 RWB                 Regional Well-Being                                            
 4 REGION_LABOUR       Regional Labour                                                
 5 REGION_SOCIAL       Regional Social and Environmental indicators                   
 6 REGION_INNOVATION   Regional Innovation                                            
 7 REG_BUSI_DEMOG_COPY Regional Business Demography copy                              
 8 SKILLS_2018_REGION  Skill needs - Regional                                         
 9 REG_BUSI_DEMOG      Regional Business Demography                                   
10 REGION_EDUCAT       Regional Education                                             
11 RFD                 Regional Government Finance and Investment Database            
12 RHPI                National and Regional House Price Indices                      
13 REGION_TYPOL        Regional typology                                              
14 RHPI_TARGET         National and Regional House Price Indices - Headline indicators

dataset <- "REGION_ECONOM"

dstruc <- get_data_structure(dataset)
str(dstruc)
List of 13
 $ VAR_DESC       :'data.frame':    13 obs. of  2 variables:
  ..$ id         : chr [1:13] "TL" "REG_ID" "SERIES" "VAR" ...
  ..$ description: chr [1:13] "Territory Level and Typology" "Region" "SNA Classification" "Indicator" ...
 $ TL             :'data.frame':    17 obs. of  2 variables:
  ..$ id   : chr [1:17] "1" "RURB" "1_PU" "1_IN" ...
  ..$ label: chr [1:17] "Country" "Country values by rural/urban typology" "     Country - predominantly urban regions" "     Country - intermediate regions" ...
 $ REG_ID         :'data.frame':    3364 obs. of  2 variables:
  ..$ id   : chr [1:3364] "AUS" "AU1" "AU101" "AU103" ...
  ..$ label: chr [1:3364] "Australia" "New South Wales" "Capital Region" "Central West" ...
 $ SERIES         :'data.frame':    3 obs. of  2 variables:
  ..$ id   : chr [1:3] "SNA_2008" "SNA_1993" "SNA_REF"
  ..$ label: chr [1:3] "Last SNA classification (SNA 2008 or latest available)" "Previous SNA classification (SNA 1993, discontinued series)" "Reference data"
 $ VAR            :'data.frame':    49 obs. of  2 variables:
  ..$ id   : chr [1:49] "GDP" "GVA_TOTAL" "GVA_IND_TOTAL" "GVA_IND_10_VA" ...
  ..$ label: chr [1:49] "Regional GDP" "Regional Gross Value Added, total activities" "Regional Gross Value Added, total activities" "GVA in agriculture, forestry and fishing (ISIC rev4)" ...
 $ MEAS           :'data.frame':    29 obs. of  2 variables:
  ..$ id   : chr [1:29] "REG" "CURR_PR" "USD_PPP" "REAL_PR" ...
  ..$ label: chr [1:29] "Regional values (in millions)" "      Millions National currency, current prices" "      Millions USD, current prices, current PPP" "      Millions National currency, constant prices, base year 2015" ...
 $ POS            :'data.frame':    4 obs. of  2 variables:
  ..$ id   : chr [1:4] "ALL" "MAX" "MIN" "AVG"
  ..$ label: chr [1:4] "All regions" "Highest regional value in the country by Territorial Level and selected indicators" "Lowest regional value in the country by Territorial Level and selected indicators" "National average"
 $ TIME           :'data.frame':    25 obs. of  2 variables:
  ..$ id   : chr [1:25] "1995" "1996" "1997" "1998" ...
  ..$ label: chr [1:25] "1995" "1996" "1997" "1998" ...
 $ OBS_STATUS     :'data.frame':    16 obs. of  2 variables:
  ..$ id   : chr [1:16] "B" "C" "D" "E" ...
  ..$ label: chr [1:16] "Break" "Non-publishable and confidential value" "Difference in methodology" "Estimated value" ...
 $ UNIT           :'data.frame':    318 obs. of  2 variables:
  ..$ id   : chr [1:318] "1" "GRWH" "AVGRW" "IDX" ...
  ..$ label: chr [1:318] "RATIOS" "Growth rate" "Average growth rate" "Index" ...
 $ POWERCODE      :'data.frame':    32 obs. of  2 variables:
  ..$ id   : chr [1:32] "0" "1" "2" "3" ...
  ..$ label: chr [1:32] "Units" "Tens" "Hundreds" "Thousands" ...
 $ REFERENCEPERIOD:'data.frame':    98 obs. of  2 variables:
  ..$ id   : chr [1:98] "2013_100" "2012_100" "2011_100" "2010_100" ...
  ..$ label: chr [1:98] "2013=100" "2012=100" "2011=100" "2010=100" ...
 $ TIME_FORMAT    :'data.frame':    5 obs. of  2 variables:
  ..$ id   : chr [1:5] "P1Y" "P1M" "P3M" "P6M" ...
  ..$ label: chr [1:5] "Annual" "Monthly" "Quarterly" "Half-yearly" ...

filter_list1 <- list("", "AUS", "", "GVA_IND_TOTAL")
df1 <- get_dataset(dataset = dataset, filter = filter_list1)
head(df1)
# A tibble: 6 x 11
  TL    REG_ID SERIES   VAR           MEAS        POS   TIME  POWERCODE UNIT  REFERENCEPERIOD obsValue
  <chr> <chr>  <chr>    <chr>         <chr>       <chr> <chr> <chr>     <chr> <chr>              <dbl>
1 1     AUS    SNA_2008 GVA_IND_TOTAL PW_REAL_PPP AVG   1995  0         NA    NA                 67090
2 1     AUS    SNA_2008 GVA_IND_TOTAL PW_REAL_PPP AVG   1996  0         NA    NA                 68908
3 1     AUS    SNA_2008 GVA_IND_TOTAL PW_REAL_PPP AVG   1997  0         NA    NA                 71418
4 1     AUS    SNA_2008 GVA_IND_TOTAL PW_REAL_PPP AVG   1998  0         NA    NA                 73830
5 1     AUS    SNA_2008 GVA_IND_TOTAL PW_REAL_PPP AVG   1999  0         NA    NA                 75869
6 1     AUS    SNA_2008 GVA_IND_TOTAL PW_REAL_PPP AVG   2000  0         NA    NA                 75468

filter_list2 <- list("", "AUS", "", "GVA_DEFLATOR_TOTAL")
df2 <- get_dataset(dataset = dataset, filter = filter_list2)
head(df2)
# A tibble: 6 x 9
  TL    REG_ID SERIES  VAR                MEAS  POS   TIME  POWERCODE obsValue
  <chr> <chr>  <chr>   <chr>              <chr> <chr> <chr> <chr>        <dbl>
1 1     AUS    SNA_REF GVA_DEFLATOR_TOTAL RATES ALL   1995  0             60.0
2 1     AUS    SNA_REF GVA_DEFLATOR_TOTAL RATES ALL   1996  0             60.6
3 1     AUS    SNA_REF GVA_DEFLATOR_TOTAL RATES ALL   1997  0             61.7
4 1     AUS    SNA_REF GVA_DEFLATOR_TOTAL RATES ALL   1998  0             61.9
5 1     AUS    SNA_REF GVA_DEFLATOR_TOTAL RATES ALL   1999  0             63.4
6 1     AUS    SNA_REF GVA_DEFLATOR_TOTAL RATES ALL   2000  0             65.6

filter_list3 <- list("", "AUS", "", "POP_AVG")
df3 <- get_dataset(dataset = dataset, filter = filter_list3)
head(df3)
# A tibble: 6 x 9
  TL    REG_ID SERIES  VAR     MEAS  POS   TIME  POWERCODE obsValue
  <chr> <chr>  <chr>   <chr>   <chr> <chr> <chr> <chr>        <dbl>
1 1     AUS    SNA_REF POP_AVG PER   ALL   1995  0         18002000
2 1     AUS    SNA_REF POP_AVG PER   ALL   1996  0         18221700
3 1     AUS    SNA_REF POP_AVG PER   ALL   1997  0         18420200
4 1     AUS    SNA_REF POP_AVG PER   ALL   1998  0         18604800
5 1     AUS    SNA_REF POP_AVG PER   ALL   1999  0         18809600
6 1     AUS    SNA_REF POP_AVG PER   ALL   2000  0         19026200

As it can be seen in the filter_list elements, we need to add "" in order to obtain the data for the variable we are interested in. As shown by dstruc , the variable ( VAR ) is the fourth element in the sequence. The first one is TL , which is left blank. Second, there is the region identification ( REG_ID ). In this example, we set it to AUS (Australia). Third, there is the element SERIES , which is also left blank.

The package's vignette is really helpful: https://github.com/expersso/OECD .

PS: Sorry for the late reply. Nonetheless, hope this answer helps.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM