简体   繁体   English

根据面板长格式中的观察数将列单元格划分为不同的数字

[英]Dividing a column cell with a different number based on number of observations in a panel long format

I have the following data which is of a panel structure.我有以下面板结构的数据。 I need to normalize each cell so that the observation for a country is divided by total number of observations for that country divided by total number of observations in the panel structure (here 10 - in my data 1100).我需要对每个单元格进行标准化,以便一个国家的观察结果除以该国家的观察总数除以面板结构中的观察总数(此处为 10 - 在我的数据中为 1100)。 Also I have showcased three countries (AL, UK, FR) but I have 92 in total so I need some general formula (mutate: by = country?).我还展示了三个国家(AL、UK、FR),但我总共有 92 个,所以我需要一些通用公式(mutate:by = country?)。

This is my data这是我的数据

df1 <- data_frame(Country = 
                    c("AL","AL","AL","AL","AL","AL","AL","AL","AL","AL",
                      "UK","UK","UK","UK","UK","UK","UK","UK","UK","UK",
                      "FR","FR","FR","FR","FR","FR","FR","FR","FR","FR"),
               Obs = c(NA,NA,2,3,2,3,2,3,2,NA,1,2,1,2,1,2,1,2,1,2,NA,NA,NA,NA,NA,NA,NA,NA,4,NA))
df1

 Country   Obs
   <chr>   <dbl>
 1 AL         NA
 2 AL         NA
 3 AL          2
 4 AL          3
 5 AL          2
 6 AL          3
 7 AL          2
 8 AL          3
 9 AL          2
10 AL         NA
11 UK          1
12 UK          2
13 UK          1
14 UK          2
15 UK          1
16 UK          2
17 UK          1
18 UK          2
19 UK          1
20 UK          2
21 FR         NA
22 FR         NA
23 FR         NA
24 FR         NA
25 FR         NA
26 FR         NA
27 FR         NA
28 FR         NA
29 FR          4
30 FR         NA

Now, what I want is to divide each cell with number of observations available for each country / total obs like so,现在,我想要的是将每个单元格与每个国家/地区可用的观察数量/总观测值分开,就像这样,

  df2 <-   data_frame(Country = 
                           c("AL","AL","AL","AL","AL","AL","AL","AL","AL","AL",
                             "UK","UK","UK","UK","UK","UK","UK","UK","UK","UK",
                             "FR","FR","FR","FR","FR","FR","FR","FR","FR","FR"),
                        Obs = c(NA,NA,2*7/10,3*7/10,2*7/10,3*7/10,2*7/10,3*7/10,2*7/10,
                               NA,1*10/10,2*10/10,1*10/10,2*10/10,1*10/10,2*10/10,1*10/10,
                                2*10/10,1*10/10,2*10/10,NA,NA,NA,NA,NA,NA,NA,NA,4*1/10,NA))

df2

  Country   Obs
   <chr>   <dbl>
 1 AL       NA  
 2 AL       NA  
 3 AL        1.4
 4 AL        3.7
 5 AL        2.7
 6 AL        3.7
 7 AL        2.7
 8 AL        3.7
 9 AL        2.7
10 AL       NA  
11 UK        1  
12 UK        2  
13 UK        1  
14 UK        2  
15 UK        1  
16 UK        2  
17 UK        1  
18 UK        2  
19 UK        1  
20 UK        2  
21 FR       NA  
22 FR       NA  
23 FR       NA  
24 FR       NA  
25 FR       NA  
26 FR       NA  
27 FR       NA  
28 FR       NA  
29 FR        0.4
30 FR       NA 

I am interested in solving the problem obviously BUT I would really really appreciate it if you could show me how to do this for multiple columns as my original data needs this same operation done for many columns where the country tickers (AL, UK, FR in example) remains the same.我显然有兴趣解决这个问题,但是如果你能告诉我如何对多个列执行此操作,我将非常感激,因为我的原始数据需要对许多国家代码(AL、UK、FR in示例)保持不变。

You can do:你可以做:

library(dplyr)

df1 %>%
  group_by(Country) %>%
  mutate(Obs = Obs * sum(!is.na(Obs))/n()) %>%
  ungroup

#  Country   Obs
#   <chr>   <dbl>
# 1 AL       NA  
# 2 AL       NA  
# 3 AL        1.4
# 4 AL        2.1
# 5 AL        1.4
# 6 AL        2.1
# 7 AL        1.4
# 8 AL        2.1
# 9 AL        1.4
#10 AL       NA  
# … with 20 more rows

sum(.is.na(Obs)) counts number of non-NA values in the Country whereas n() gives the number of rows for the Country . sum(.is.na(Obs))计算Country中非 NA 值的数量,而n()给出Country的行数。

For multiple columns -对于多列 -

df1 %>%
  group_by(Country) %>%
  mutate(across(col1:col4, ~. * sum(!is.na(.))/n())) %>%
  ungroup

This will be applied to col1 to col4 in your dataframe.这将应用于 dataframe 中的col1col4

Using data.table使用data.table

library(data.table)
setDT(df1)[, Obs := Obs * mean(!is.na(Obs)), County]

Or using dplyr或使用dplyr

library(dplyr)
df1 %>%
  group_by(Country) %>%
  mutate(Obs = Obs * mean(!is.na(Obs)))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM