简体   繁体   English

如何对R中的数据框中的某些行求和和加权?

[英]How to sum and weight certain rows in a dataframe in R?

I currently have a data.frame which is as follows:我目前有一个 data.frame 如下:

  State      Area_name    LessHSD        HSD    SomeCAD   BDorMore P_LessHSD P_HSD ZIP
1    US  United States 26,948,057 59,265,308 63,365,655 68,867,051      12.3  27.1 1009
1913 NY Richmond County    37,675    101,738     81,014    108,326      11.5  30.9 36085
2    AL        Alabama    470,043  1,020,172    987,148    822,595      14.2  30.9 1020
3    AL Autauga County      4,204     12,119     10,552     10,291      11.3  32.6 7080
1873 NY Bronx County      258,956    255,427    226,620    183,134       28   27.6 36005
1911 NY Queens County     303,881    454,105    369,271    518,999      18.5  27.6 36081  
4    AL Baldwin County     14,310     40,579     46,025     46,075       9.7  27.6 1088
1901 NY New York County   162,237    155,048    171,461    758,325        13  12.4 36061
5    AL Barbour County      4,901      6,486      4,566      2,220      27.0  35.7 20012
1894 NY Kings County      326,469    455,299 3   47,052    648,461      18.4  25.6 36047
6    AL    Bibb County      2,650      7,471      3,846      1,813      16.8  47.3 9012

I would like to sum up the 5 New York City burroughs (ZIP 36005,36047,36061,36081,36085) data for the columns LessHSD , HSD , SomeCAD and create a new row with these sums with Area_name = New York Proper (see output below).我想总结列LessHSDHSDSomeCAD的 5 个纽约市 burroughs (ZIP 36005,36047,36061,36081,36085) 数据,并用Area_name = New York Proper用这些总和创建一个新行(见输出以下)。

For the columns P_LessHSD , and P_HSD , I would like to weight these variables by population into a new row.对于列P_LessHSDP_HSD ,我想按人口将这些变量加权到一个新行中。 I have already calculated the weights myself from another set.我已经从另一组计算了自己的权重。 I would like to multiply Richmond County by 0.05669632 , Bronx County by 0.17051732 , Queens by 0.27133878 , New York County by 0.19392188 , and Kings by 0.3075256 .我想将里士满县乘以0.05669632 ,布朗克斯县乘以0.17051732 ,皇后区乘以0.27133878 ,纽约县乘以0.19392188 ,国王乘以0.3075256

Tangibly, for the column P_LessHSD, this would look like:显然,对于列 P_LessHSD,这看起来像:

11.5*0.05669632 
+ 28*0.17051732
+ 18.5*0.27133878 
+ 13*0.19392188 
+ 18.4*0.3075256

giving 18.6 (when rounded to tens place).给出 18.6(四舍五入到十位时)。 This would be done for P_HSD too.这也适用于 P_HSD。 I would like the ZIP of the new row to be 55555. I would also like to delete all 5 rows with the Burroughs.我希望新行的 ZIP 为 55555。我还想删除 Burroughs 的所有 5 行。

Output should be:输出应该是:

  State      Area_name    LessHSD        HSD    SomeCAD   BDorMore P_LessHSD P_HSD ZIP
1    US  United States 26,948,057 59,265,308 63,365,655 68,867,051      12.3  27.1 1009
2    AL        Alabama    470,043  1,020,172    987,148    822,595      14.2  30.9 1020
3    AL Autauga County      4,204     12,119     10,552     10,291      11.3  32.6 7080  
4    AL Baldwin County     14,310     40,579     46,025     46,075       9.7  27.6 1088
5    AL Barbour County      4,901      6,486      4,566      2,220      27.0  35.7 20012
6    AL    Bibb County      2,650      7,471      3,846      1,813      16.8  47.3 9012
7    NY New York Proper   1089218    1421617     895418    2217245      18.6  24.2 55555

Might it helps.可能有帮助。

It use dplyr package.它使用dplyr包。 You need install it first你需要先安装它

install.packages("dplyr")
library(dplyr)

DF %>% 
  filter(!(ZIP %in% c(36005,36047,36061,36081,36085))) %>%
  bind_rows(
        DF %>%
          filter(ZIP %in% c(36005,36047,36061,36081,36085)) %>%
          mutate(wg = case_when(Area_name == "Richmond County" ~ 0.05669632, 
                                Area_name == "Bronx County" ~ 0.17051732,
                                Area_name == "Queens County" ~ 0.27133878,
                                Area_name == "New York County" ~ 0.19392188, 
                                Area_name == "Kings County" ~ 0.3075256,
                                TRUE ~ 0),
                 P_LessHSD = wg*P_LessHSD,
                 P_HSD = wg*P_HSD,
                 Area_name = "New York Proper") %>%
          group_by(State, Area_name) %>%
          summarize_at(vars(LessHSD:P_HSD), sum) %>%
          mutate(ZIP = 55555) )

# # A tibble: 7 x 9
#   State Area_name        LessHSD      HSD  SomeCAD BDorMore P_LessHSD P_HSD   ZIP
#   <chr> <chr>              <dbl>    <dbl>    <dbl>    <dbl>     <dbl> <dbl> <dbl>
# 1 US    United States   26948057 59265308 63365655 68867051      12.3  27.1  1009
# 2 AL    Alabama           470043  1020172   987148   822595      14.2  30.9  1020
# 3 AL    Autauga County      4204    12119    10552    10291      11.3  32.6  7080
# 4 AL    Baldwin County     14310    40579    46025    46075       9.7  27.6  1088
# 5 AL    Barbour County      4901     6486     4566     2220      27    35.7 20012
# 6 AL    Bibb County         2650     7471     3846     1813      16.8  47.3  9012
# 7 NY    New York Proper  1089218  1421617  1195418  2217245      18.6  24.2 55555

PS.附注。 It gives different result for someCAD .它为someCAD提供了不同的结果。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM