I'm quite new to R and am now trying to get data from the OECD website using the "OECD" package. However, I cannot make it work.
The data I would want can be found here https://stats.oecd.org/Index.aspx?DataSetCode=REGION_ECONOM# . I want to extract GVA , GVA deflator , population and GVA by sector from these tables on the regional level. But every time I use below code, R just doesn't produce any output:
library(OECD)
df <- get_dataset("REGION_ECONOM")
If any of you could help me I'd be beyond grateful :D
To address your question, two main issues rise. First, since you did not specif icy any country or region, acquiring the data takes a long time. Second, the OECD
package is somehow case sensitive, regarding the sequence of filters.
Here is an example to obtain the data you requested, considering only one country (AUD) to simplify the answer.
library(OECD)
dataset_list <- get_datasets()
search_dataset("regional", data = dataset_list)
# A tibble: 14 x 2
id title
<chr> <chr>
1 REGION_DEMOGR Regional Demography
2 REGION_ECONOM Regional Economy
3 RWB Regional Well-Being
4 REGION_LABOUR Regional Labour
5 REGION_SOCIAL Regional Social and Environmental indicators
6 REGION_INNOVATION Regional Innovation
7 REG_BUSI_DEMOG_COPY Regional Business Demography copy
8 SKILLS_2018_REGION Skill needs - Regional
9 REG_BUSI_DEMOG Regional Business Demography
10 REGION_EDUCAT Regional Education
11 RFD Regional Government Finance and Investment Database
12 RHPI National and Regional House Price Indices
13 REGION_TYPOL Regional typology
14 RHPI_TARGET National and Regional House Price Indices - Headline indicators
dataset <- "REGION_ECONOM"
dstruc <- get_data_structure(dataset)
str(dstruc)
List of 13
$ VAR_DESC :'data.frame': 13 obs. of 2 variables:
..$ id : chr [1:13] "TL" "REG_ID" "SERIES" "VAR" ...
..$ description: chr [1:13] "Territory Level and Typology" "Region" "SNA Classification" "Indicator" ...
$ TL :'data.frame': 17 obs. of 2 variables:
..$ id : chr [1:17] "1" "RURB" "1_PU" "1_IN" ...
..$ label: chr [1:17] "Country" "Country values by rural/urban typology" " Country - predominantly urban regions" " Country - intermediate regions" ...
$ REG_ID :'data.frame': 3364 obs. of 2 variables:
..$ id : chr [1:3364] "AUS" "AU1" "AU101" "AU103" ...
..$ label: chr [1:3364] "Australia" "New South Wales" "Capital Region" "Central West" ...
$ SERIES :'data.frame': 3 obs. of 2 variables:
..$ id : chr [1:3] "SNA_2008" "SNA_1993" "SNA_REF"
..$ label: chr [1:3] "Last SNA classification (SNA 2008 or latest available)" "Previous SNA classification (SNA 1993, discontinued series)" "Reference data"
$ VAR :'data.frame': 49 obs. of 2 variables:
..$ id : chr [1:49] "GDP" "GVA_TOTAL" "GVA_IND_TOTAL" "GVA_IND_10_VA" ...
..$ label: chr [1:49] "Regional GDP" "Regional Gross Value Added, total activities" "Regional Gross Value Added, total activities" "GVA in agriculture, forestry and fishing (ISIC rev4)" ...
$ MEAS :'data.frame': 29 obs. of 2 variables:
..$ id : chr [1:29] "REG" "CURR_PR" "USD_PPP" "REAL_PR" ...
..$ label: chr [1:29] "Regional values (in millions)" " Millions National currency, current prices" " Millions USD, current prices, current PPP" " Millions National currency, constant prices, base year 2015" ...
$ POS :'data.frame': 4 obs. of 2 variables:
..$ id : chr [1:4] "ALL" "MAX" "MIN" "AVG"
..$ label: chr [1:4] "All regions" "Highest regional value in the country by Territorial Level and selected indicators" "Lowest regional value in the country by Territorial Level and selected indicators" "National average"
$ TIME :'data.frame': 25 obs. of 2 variables:
..$ id : chr [1:25] "1995" "1996" "1997" "1998" ...
..$ label: chr [1:25] "1995" "1996" "1997" "1998" ...
$ OBS_STATUS :'data.frame': 16 obs. of 2 variables:
..$ id : chr [1:16] "B" "C" "D" "E" ...
..$ label: chr [1:16] "Break" "Non-publishable and confidential value" "Difference in methodology" "Estimated value" ...
$ UNIT :'data.frame': 318 obs. of 2 variables:
..$ id : chr [1:318] "1" "GRWH" "AVGRW" "IDX" ...
..$ label: chr [1:318] "RATIOS" "Growth rate" "Average growth rate" "Index" ...
$ POWERCODE :'data.frame': 32 obs. of 2 variables:
..$ id : chr [1:32] "0" "1" "2" "3" ...
..$ label: chr [1:32] "Units" "Tens" "Hundreds" "Thousands" ...
$ REFERENCEPERIOD:'data.frame': 98 obs. of 2 variables:
..$ id : chr [1:98] "2013_100" "2012_100" "2011_100" "2010_100" ...
..$ label: chr [1:98] "2013=100" "2012=100" "2011=100" "2010=100" ...
$ TIME_FORMAT :'data.frame': 5 obs. of 2 variables:
..$ id : chr [1:5] "P1Y" "P1M" "P3M" "P6M" ...
..$ label: chr [1:5] "Annual" "Monthly" "Quarterly" "Half-yearly" ...
filter_list1 <- list("", "AUS", "", "GVA_IND_TOTAL")
df1 <- get_dataset(dataset = dataset, filter = filter_list1)
head(df1)
# A tibble: 6 x 11
TL REG_ID SERIES VAR MEAS POS TIME POWERCODE UNIT REFERENCEPERIOD obsValue
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl>
1 1 AUS SNA_2008 GVA_IND_TOTAL PW_REAL_PPP AVG 1995 0 NA NA 67090
2 1 AUS SNA_2008 GVA_IND_TOTAL PW_REAL_PPP AVG 1996 0 NA NA 68908
3 1 AUS SNA_2008 GVA_IND_TOTAL PW_REAL_PPP AVG 1997 0 NA NA 71418
4 1 AUS SNA_2008 GVA_IND_TOTAL PW_REAL_PPP AVG 1998 0 NA NA 73830
5 1 AUS SNA_2008 GVA_IND_TOTAL PW_REAL_PPP AVG 1999 0 NA NA 75869
6 1 AUS SNA_2008 GVA_IND_TOTAL PW_REAL_PPP AVG 2000 0 NA NA 75468
filter_list2 <- list("", "AUS", "", "GVA_DEFLATOR_TOTAL")
df2 <- get_dataset(dataset = dataset, filter = filter_list2)
head(df2)
# A tibble: 6 x 9
TL REG_ID SERIES VAR MEAS POS TIME POWERCODE obsValue
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl>
1 1 AUS SNA_REF GVA_DEFLATOR_TOTAL RATES ALL 1995 0 60.0
2 1 AUS SNA_REF GVA_DEFLATOR_TOTAL RATES ALL 1996 0 60.6
3 1 AUS SNA_REF GVA_DEFLATOR_TOTAL RATES ALL 1997 0 61.7
4 1 AUS SNA_REF GVA_DEFLATOR_TOTAL RATES ALL 1998 0 61.9
5 1 AUS SNA_REF GVA_DEFLATOR_TOTAL RATES ALL 1999 0 63.4
6 1 AUS SNA_REF GVA_DEFLATOR_TOTAL RATES ALL 2000 0 65.6
filter_list3 <- list("", "AUS", "", "POP_AVG")
df3 <- get_dataset(dataset = dataset, filter = filter_list3)
head(df3)
# A tibble: 6 x 9
TL REG_ID SERIES VAR MEAS POS TIME POWERCODE obsValue
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl>
1 1 AUS SNA_REF POP_AVG PER ALL 1995 0 18002000
2 1 AUS SNA_REF POP_AVG PER ALL 1996 0 18221700
3 1 AUS SNA_REF POP_AVG PER ALL 1997 0 18420200
4 1 AUS SNA_REF POP_AVG PER ALL 1998 0 18604800
5 1 AUS SNA_REF POP_AVG PER ALL 1999 0 18809600
6 1 AUS SNA_REF POP_AVG PER ALL 2000 0 19026200
As it can be seen in the filter_list
elements, we need to add ""
in order to obtain the data for the variable we are interested in. As shown by dstruc
, the variable ( VAR
) is the fourth element in the sequence. The first one is TL
, which is left blank. Second, there is the region identification ( REG_ID
). In this example, we set it to AUS
(Australia). Third, there is the element SERIES
, which is also left blank.
The package's vignette is really helpful: https://github.com/expersso/OECD .
PS: Sorry for the late reply. Nonetheless, hope this answer helps.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.