簡體   English   中英

如何在 r 中包含缺失的國家/地區

[英]How to include missing countries to df in r

這個問題是我上一篇文章的衍生問題。

我有一個關於並購 (M&A) 的大數據框(90 萬行)。

df 有四列:date(並購完成時間)、target_nation(被兼並/收購的國家/地區的公司)、acquiror_nation(收購方是哪個國家/地區的公司)和 big_corp(收購方是大公司還是大公司)不是,TRUE 表示公司很大)。

這是我的df示例:

    df <- structure(list(date = c(2000L, 2000L, 2001L, 2001L, 2001L, 2003L, 
2003L, 1999L, 2001L, 2002L, 2002L, 2002L), target_nation = c("Uganda", 
"Uganda", "Uganda", "Uganda", "Uganda", "Uganda", "Mozambique", 
"Mozambique", "Mozambique", "Mozambique", "Mozambique", "Mozambique"
), acquiror_nation = c("France", "Germany", "France", "France", 
"Germany", "Germany", "Germany", "Germany", "France", "France", 
"Germany", "Japan"), big_corp_TF = c(TRUE, FALSE, TRUE, FALSE, 
FALSE, TRUE, FALSE, FALSE, TRUE, FALSE, TRUE, TRUE)), row.names = c(NA, 
-12L), class = c("data.table", "data.frame"))

> df

  date target_nation acquiror_nation big_corp_TF
 1: 2000        Uganda          France        TRUE
 2: 2000        Uganda         Germany       FALSE
 3: 2001        Uganda          France        TRUE
 4: 2001        Uganda          France       FALSE
 5: 2001        Uganda         Germany       FALSE
 6: 2003        Uganda         Germany        TRUE
 7: 2003    Mozambique         Germany       FALSE
 8: 1999    Mozambique         Germany       FALSE
 9: 2001    Mozambique          France        TRUE
10: 2002    Mozambique          France       FALSE
11: 2002    Mozambique         Germany        TRUE
12: 2002    Mozambique           Japan        TRUE

從這些數據中,我想創建一個新列,表示特定收購國的大公司在特定目標國家進行的並購份額,計算 2 年的平均值。 (對於我的實際練習,我將計算 5 年的平均值,但讓我們在這里保持簡單)。

有一組收購國是我特別感興趣的(在這個例子中,假設是法國、德國和日本)。 我希望有一個專欄來表示這些國家的上述份額。

@AnilGoyal 之前幫助我編寫了代碼。 這是代碼:

df_calc <- df %>%
  mutate(d = 1) %>%
  group_by(target_nation) %>%
  complete(date = seq(min(date), max(date), 1), nesting(acquiror_nation),
           fill = list(d = 0, big_corp_TF = FALSE)) %>%
  group_by(date, target_nation) %>%
  mutate(total_MAs = sum(d)) %>%
  group_by(date, target_nation, acquiror_nation) %>%
  summarise(total_MAs = mean(total_MAs),
            total_MAs_bigcorp = sum(big_corp_TF), .groups = 'drop') %>%
  group_by(target_nation, acquiror_nation) %>%
  mutate(share = sum_run(total_MAs_bigcorp, k=2)/sum_run(total_MAs, k=2))

這是 output:

  date   targ_nat    acq_nat tot_MA big_MA  share
1   1999    Mozambique  France  1   0   0.0000000
2   1999    Mozambique  Germany 1   0   0.0000000
3   1999    Mozambique  Japan   1   0   0.0000000
4   2000    Mozambique  France  0   0   0.0000000
5   2000    Mozambique  Germany 0   0   0.0000000
6   2000    Mozambique  Japan   0   0   0.0000000
7   2001    Mozambique  France  1   1   1.0000000
8   2001    Mozambique  Germany 1   0   0.0000000
9   2001    Mozambique  Japan   1   0   0.0000000
10  2002    Mozambique  France  3   0   0.2500000
11  2002    Mozambique  Germany 3   1   0.2500000
12  2002    Mozambique  Japan   3   1   0.2500000
13  2003    Mozambique  France  1   0   0.0000000
14  2003    Mozambique  Germany 1   0   0.2500000
15  2003    Mozambique  Japan   1   0   0.2500000
16  2000    Uganda     France   2   1   0.5000000
17  2000    Uganda    Germany   2   0   0.0000000
18  2001    Uganda    France    3   1   0.4000000
19  2001    Uganda    Germany   3   0   0.0000000
20  2002    Uganda    France    0   0   0.3333333
21  2002    Uganda    Germany   0   0   0.0000000
22  2003    Uganda    France    1   0   0.0000000
23  2003    Uganda    Germany   1   1   1.0000000

所有的數字都如你所願。 但是,我希望日本在烏干達的投資能有成果,但不能成功。 我怎樣才能做到這一點? 據我了解,日本在烏干達沒有結果的原因是日本在任何一年都沒有在烏干達進行任何投資(如上圖數據樣本所示); 但是這種缺乏投資對我來說是一個有意義的結果,我希望日本也能成為收購國。 就像這樣(出於空間原因,我將莫桑比克排除為 targ_nat):

  date   targ_nat    acq_nat tot_MA big_MA  share
16  2000    Uganda     France   2   1   0.5000000
17  2000    Uganda    Germany   2   0   0.0000000
18  2000    Uganda    Japan     2   0   0.0000000
19  2001    Uganda    France    3   1   0.4000000
20  2001    Uganda    Germany   3   0   0.0000000
21  2001    Uganda    Japan     3   0   0.0000000
22  2002    Uganda    France    0   0   0.3333333
22  2002    Uganda    Germany   0   0   0.0000000
23  2002    Uganda    Japan     0   0   0.0000000
24  2003    Uganda    France    1   0   0.0000000
25  2003    Uganda    Germany   1   1   1.0000000
26  2003    Uganda    Japan     1   0   0.0000000

關於如何實現這一目標的任何想法? 就我的實際目的而言,我有一組 13 個國家,我希望將其結果視為收購國(不僅僅是法國、德國和日本)。 這些國家在數據集中顯示為收購國(但並非針對所有 target_nations (.) --- 就像這里的烏干達和日本的示例一樣)。

非常感謝任何幫助。

這將需要complete

library(dplyr)
library(tidyr)
out <- df_calc %>% 
   group_by(target_nation, date, total_MAs) %>%
   complete(acquiror_nation = unique(.$acquiror_nation),
   fill = list(total_MAs_bigcorp = 0, share = 0)) %>%
   ungroup

-檢查“烏干達”的 output

out %>% 
   filter(target_nation == 'Uganda')
# A tibble: 12 x 6
#   target_nation  date total_MAs acquiror_nation total_MAs_bigcorp share
#   <chr>         <dbl>     <dbl> <chr>                       <dbl> <dbl>
# 1 Uganda         2000         2 France                          1 0.5  
# 2 Uganda         2000         2 Germany                         0 0    
# 3 Uganda         2000         2 Japan                           0 0    
# 4 Uganda         2001         3 France                          1 0.4  
# 5 Uganda         2001         3 Germany                         0 0    
# 6 Uganda         2001         3 Japan                           0 0    
# 7 Uganda         2002         0 France                          0 0.333
# 8 Uganda         2002         0 Germany                         0 0    
# 9 Uganda         2002         0 Japan                           0 0    
#10 Uganda         2003         1 France                          0 0    
#11 Uganda         2003         1 Germany                         1 1    
#12 Uganda         2003         1 Japan                           0 0    

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM