簡體   English   中英

R:每組時間序列的平均年數

[英]R: Average years in time series per group

親愛的社區,

我正在與 R 合作,並在 20 年內尋找雙邊出口的時間序列數據趨勢。 由於這些年之間的數據波動很大(而且不是 100% 可靠),我更願意使用四年平均數據(而不是單獨查看每一年)來分析主要出口隨着時間的推移,合作伙伴發生了變化。 我有以下數據集,稱為GrossExp3 ,涵蓋了 15 個報告國在(1998 年至 2019 年)之間的所有年份對所有可用伙伴國家的雙邊出口(以 1000 美元為單位)。 它涵蓋以下四個變量:Year、ReporterName (= exporter)、PartnerName (= export destination)、'TradeValue in 1000 USD' (= export value to the destination) PartnerName 列還包括一個名為“All”的條目,其中是記者每年所有出口的總和。

這是我的數據摘要

> summary(GrossExp3)
      Year      ReporterName       PartnerName        TradeValue in 1000 USD
 Min.   :1998   Length:35961       Length:35961       Min.   :       0      
 1st Qu.:2004   Class :character   Class :character   1st Qu.:      39      
 Median :2009   Mode  :character   Mode  :character   Median :     597      
 Mean   :2009                                         Mean   :  134370      
 3rd Qu.:2014                                         3rd Qu.:   10090      
 Max.   :2018                                         Max.   :47471515

我的目標是返回一個表格,顯示每個出口商對出口目的地的貿易總額占該時期出口總額的百分比。 我希望獲得以下期間的平均數據,而不是每一年:2000-2003、2004-2007、2008-2011、2012-2015、2016-2019。

我嘗試了什么我當前的代碼(在這個驚人的社區的支持下創建如下:(目前,它分別顯示每年的數據,但我需要標題中的平均數據)

# install packages
library(data.table)
library(dplyr)
library(tidyr)
library(stringr)
library(plyr)
library(visdat)

# set working directory
setwd("C:/R/R_09.2020/Other Indicators/Bilateral Trade Shift of Partners")

# load data

# create a file path SITC 3 
path1 <- file.path("SITC Rev 3_Data from 1998.csv")

# load cvs data table, call "SITC3" 
SITC3 <- fread(path1, drop = c(1,9,11,13))

# prepare data (SITC3) for analysis
# Filter for GROSS EXPORTS SITC3 (Gross exports = Exports that include intermediate products)
GrossExp3 <- SITC3 %>%
  filter(TradeFlowName == "Gross Exp.", PartnerISO3 != "All", Year != 2019) %>%  # filter for gross exports, remove "All", remove 2019
  select(Year, ReporterName, PartnerName, `TradeValue in 1000 USD`) %>%
  arrange(ReporterName, desc(Year))
# compare with old subset
summary(GrossExp3)
summary(SITC3)

# calculate percentage of total
GrossExp3Main <- GrossExp3 %>%
  group_by(Year, ReporterName) %>%
  add_tally(wt = `TradeValue in 1000 USD`, name = "TotalValue") %>%
  mutate(Percentage = 100 * (`TradeValue in 1000 USD` / TotalValue)) %>%
  arrange(ReporterName, desc(Year), desc(Percentage))
head(GrossExp3Main, n = 20)

# print tables in separate sheets to get an overview about hierarchy of export partners and development over time
SpreadExpMain <- GrossExp3Main %>%
  select(Year, ReporterName, PartnerName, Percentage) %>%
  spread(key = Year, value = Percentage) %>%
  arrange(ReporterName, desc(`2018`))
View(SpreadExpMain) # shows whole table

這是我的數據頭

> head(GrossExp3Main, n = 20)
# A tibble: 20 x 6
# Groups:   Year, ReporterName [7]
    Year ReporterName PartnerName   `TradeValue in 100~ TotalValue Percentage
   <int> <chr>        <chr>                       <dbl>      <dbl>      <dbl>
 1  2018 Angola       China                   24517058.  42096736.      58.2 
 2  2018 Angola       India                    3768940.  42096736.       8.95
 3  2017 Angola       China                   19487067.  34904881.      55.8 
 4  2017 Angola       India                    2890061.  34904881.       8.28
 5  2016 Angola       China                   13923092.  28057500.      49.6 
 6  2016 Angola       India                    1948845.  28057500.       6.95
 7  2016 Angola       United States            1525650.  28057500.       5.44
 8  2015 Angola       China                   14320566.  33924937.      42.2 
 9  2015 Angola       India                    2676340.  33924937.       7.89
10  2015 Angola       Spain                    2245976.  33924937.       6.62
11  2014 Angola       China                   27527111.  58672369.      46.9 
12  2014 Angola       India                    4507416.  58672369.       7.68
13  2014 Angola       Spain                    3726455.  58672369.       6.35
14  2013 Angola       China                   31947235.  67712527.      47.2 
15  2013 Angola       India                    6764233.  67712527.       9.99
16  2013 Angola       United States            5018391.  67712527.       7.41
17  2013 Angola       Other Asia, ~            4007020.  67712527.       5.92
18  2012 Angola       China                   33710030.  70863076.      47.6 
19  2012 Angola       India                    6932061.  70863076.       9.78
20  2012 Angola       United States            6594526.  70863076.       9.31

我不確定我到此為止的結果是否正確 此外,我還有以下問題:

  • 您對如何使用 R 打印漂亮的表格有什么建議嗎?
  • 如何更好地將百分比數據四舍五入到逗號后面的一個數字?

由於我在一周內一直被這些問題困擾,我將非常感謝有關如何解決該問題的任何建議!

祝你周末愉快,一切順利,

梅利克

** 編輯** 這里是一些示例數據

dput(head(GrossExp3Main, n = 20))
structure(list(Year = c(2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 
2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 
2018L, 2018L, 2018L, 2018L, 2018L), ReporterName = c("Angola", 
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola", 
"Angola", "Angola", "Angola", "Angola", "Angola", "Angola", "Angola", 
"Angola", "Angola", "Angola", "Angola", "Angola"), PartnerName = c("China", 
"India", "United States", "Spain", "South Africa", "Portugal", 
"United Arab Emirates", "France", "Thailand", "Canada", "Indonesia", 
"Singapore", "Italy", "Israel", "United Kingdom", "Unspecified", 
"Namibia", "Uruguay", "Congo, Rep.", "Japan"), `TradeValue in 1000 USD` = c(24517058.342, 
3768940.47, 1470132.736, 1250554.873, 1161852.097, 1074137.369, 
884725.078, 734551.345, 649626.328, 647164.297, 575477.283, 513982.584, 
468914.918, 452453.482, 425616.975, 423008.886, 327921.516, 320586.229, 
299119.102, 264671.779), TotalValue = c(42096736.31, 42096736.31, 
42096736.31, 42096736.31, 42096736.31, 42096736.31, 42096736.31, 
42096736.31, 42096736.31, 42096736.31, 42096736.31, 42096736.31, 
42096736.31, 42096736.31, 42096736.31, 42096736.31, 42096736.31, 
42096736.31, 42096736.31, 42096736.31), Percentage = c(58.2398078593471, 
8.9530467213552, 3.49227247731025, 2.97066942147468, 2.75995765667944, 
2.55159298119945, 2.10164767046284, 1.74491281127062, 1.54317504144777, 
1.53732653342598, 1.3670353890672, 1.22095589599877, 1.11389850877492, 
1.07479467925527, 1.01104506502775, 1.00484959899258, 0.778971352043039, 
0.761546516668669, 0.710551762961598, 0.62872279943737)), row.names = c(NA, 
-20L), groups = structure(list(Year = 2018L, ReporterName = "Angola", 
    .rows = structure(list(1:20), ptype = integer(0), class = c("vctrs_list_of", 
    "vctrs_vctr", "list"))), row.names = 1L, class = c("tbl_df", 
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"))
> 

要執行您想要的操作,需要一個額外的變量來將年份組合在一起。 我用cut來做到這一點。

library(dplyr)
# Define the cut breaks and labels for each group
# The cut define by the starting of each group and when using cut function
# I would use param right = FALSE to have the desire cut that I want here.
year_group_break <- c(2000, 2004, 2008, 2012, 2016, 2020)
year_group_labels <- c("2000-2003", "2004-2007", "2008-2011", "2012-2015", "2016-2019")

data %>%
  # create the year group variable
  mutate(year_group = cut(Year, breaks = year_group_break,
    labels  = year_group_labels,
    include.lowest = TRUE, right = FALSE)) %>%
  # calculte the total value for each Reporter + Partner in each year group
  group_by(year_group, ReporterName, PartnerName) %>%
  summarize(`TradeValue in 1000 USD` = sum(`TradeValue in 1000 USD`),
    .groups = "drop") %>%
  # calculate the percentage value for Partner of each Reporter/Year group
  group_by(year_group, ReporterName) %>%
  mutate(Percentage = `TradeValue in 1000 USD` / sum(`TradeValue in 1000 USD`)) %>%
  
  ungroup()

樣品 output

   year_group ReporterName PartnerName          `TradeValue in 1000 USD` Percentage
   <fct>      <chr>        <chr>                                   <dbl>      <dbl>
 1 2016-2019  Angola       Canada                                647164.    0.0161 
 2 2016-2019  Angola       China                               24517058.    0.609  
 3 2016-2019  Angola       Congo, Rep.                           299119.    0.00744
 4 2016-2019  Angola       France                                734551.    0.0183 
 5 2016-2019  Angola       India                                3768940.    0.0937 
 6 2016-2019  Angola       Indonesia                             575477.    0.0143 
 7 2016-2019  Angola       Israel                                452453.    0.0112 
 8 2016-2019  Angola       Italy                                 468915.    0.0117 
 9 2016-2019  Angola       Japan                                 264672.    0.00658
10 2016-2019  Angola       Namibia                               327922.    0.00815
11 2016-2019  Angola       Portugal                             1074137.    0.0267 
12 2016-2019  Angola       Singapore                             513983.    0.0128 
13 2016-2019  Angola       South Africa                         1161852.    0.0289 
14 2016-2019  Angola       Spain                                1250555.    0.0311 
15 2016-2019  Angola       Thailand                              649626.    0.0161 
16 2016-2019  Angola       United Arab Emirates                  884725.    0.0220 
17 2016-2019  Angola       United Kingdom                        425617.    0.0106 
18 2016-2019  Angola       United States                        1470133.    0.0365 
19 2016-2019  Angola       Unspecified                           423009.    0.0105 
20 2016-2019  Angola       Uruguay                               320586.    0.00797

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM