簡體   English   中英

如何對R中的數據框中的某些行求和和加權?

[英]How to sum and weight certain rows in a dataframe in R?

我目前有一個 data.frame 如下:

  State      Area_name    LessHSD        HSD    SomeCAD   BDorMore P_LessHSD P_HSD ZIP
1    US  United States 26,948,057 59,265,308 63,365,655 68,867,051      12.3  27.1 1009
1913 NY Richmond County    37,675    101,738     81,014    108,326      11.5  30.9 36085
2    AL        Alabama    470,043  1,020,172    987,148    822,595      14.2  30.9 1020
3    AL Autauga County      4,204     12,119     10,552     10,291      11.3  32.6 7080
1873 NY Bronx County      258,956    255,427    226,620    183,134       28   27.6 36005
1911 NY Queens County     303,881    454,105    369,271    518,999      18.5  27.6 36081  
4    AL Baldwin County     14,310     40,579     46,025     46,075       9.7  27.6 1088
1901 NY New York County   162,237    155,048    171,461    758,325        13  12.4 36061
5    AL Barbour County      4,901      6,486      4,566      2,220      27.0  35.7 20012
1894 NY Kings County      326,469    455,299 3   47,052    648,461      18.4  25.6 36047
6    AL    Bibb County      2,650      7,471      3,846      1,813      16.8  47.3 9012

我想總結列LessHSDHSDSomeCAD的 5 個紐約市 burroughs (ZIP 36005,36047,36061,36081,36085) 數據,並用Area_name = New York Proper用這些總和創建一個新行(見輸出以下)。

對於列P_LessHSDP_HSD ,我想按人口將這些變量加權到一個新行中。 我已經從另一組計算了自己的權重。 我想將里士滿縣乘以0.05669632 ,布朗克斯縣乘以0.17051732 ,皇后區乘以0.27133878 ,紐約縣乘以0.19392188 ,國王乘以0.3075256

顯然,對於列 P_LessHSD,這看起來像:

11.5*0.05669632 
+ 28*0.17051732
+ 18.5*0.27133878 
+ 13*0.19392188 
+ 18.4*0.3075256

給出 18.6(四舍五入到十位時)。 這也適用於 P_HSD。 我希望新行的 ZIP 為 55555。我還想刪除 Burroughs 的所有 5 行。

輸出應該是:

  State      Area_name    LessHSD        HSD    SomeCAD   BDorMore P_LessHSD P_HSD ZIP
1    US  United States 26,948,057 59,265,308 63,365,655 68,867,051      12.3  27.1 1009
2    AL        Alabama    470,043  1,020,172    987,148    822,595      14.2  30.9 1020
3    AL Autauga County      4,204     12,119     10,552     10,291      11.3  32.6 7080  
4    AL Baldwin County     14,310     40,579     46,025     46,075       9.7  27.6 1088
5    AL Barbour County      4,901      6,486      4,566      2,220      27.0  35.7 20012
6    AL    Bibb County      2,650      7,471      3,846      1,813      16.8  47.3 9012
7    NY New York Proper   1089218    1421617     895418    2217245      18.6  24.2 55555

可能有幫助。

它使用dplyr包。 你需要先安裝它

install.packages("dplyr")
library(dplyr)

DF %>% 
  filter(!(ZIP %in% c(36005,36047,36061,36081,36085))) %>%
  bind_rows(
        DF %>%
          filter(ZIP %in% c(36005,36047,36061,36081,36085)) %>%
          mutate(wg = case_when(Area_name == "Richmond County" ~ 0.05669632, 
                                Area_name == "Bronx County" ~ 0.17051732,
                                Area_name == "Queens County" ~ 0.27133878,
                                Area_name == "New York County" ~ 0.19392188, 
                                Area_name == "Kings County" ~ 0.3075256,
                                TRUE ~ 0),
                 P_LessHSD = wg*P_LessHSD,
                 P_HSD = wg*P_HSD,
                 Area_name = "New York Proper") %>%
          group_by(State, Area_name) %>%
          summarize_at(vars(LessHSD:P_HSD), sum) %>%
          mutate(ZIP = 55555) )

# # A tibble: 7 x 9
#   State Area_name        LessHSD      HSD  SomeCAD BDorMore P_LessHSD P_HSD   ZIP
#   <chr> <chr>              <dbl>    <dbl>    <dbl>    <dbl>     <dbl> <dbl> <dbl>
# 1 US    United States   26948057 59265308 63365655 68867051      12.3  27.1  1009
# 2 AL    Alabama           470043  1020172   987148   822595      14.2  30.9  1020
# 3 AL    Autauga County      4204    12119    10552    10291      11.3  32.6  7080
# 4 AL    Baldwin County     14310    40579    46025    46075       9.7  27.6  1088
# 5 AL    Barbour County      4901     6486     4566     2220      27    35.7 20012
# 6 AL    Bibb County         2650     7471     3846     1813      16.8  47.3  9012
# 7 NY    New York Proper  1089218  1421617  1195418  2217245      18.6  24.2 55555

附注。 它為someCAD提供了不同的結果。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM