[英]How to sum and weight certain rows in a dataframe in R?
我目前有一個 data.frame 如下:
State Area_name LessHSD HSD SomeCAD BDorMore P_LessHSD P_HSD ZIP
1 US United States 26,948,057 59,265,308 63,365,655 68,867,051 12.3 27.1 1009
1913 NY Richmond County 37,675 101,738 81,014 108,326 11.5 30.9 36085
2 AL Alabama 470,043 1,020,172 987,148 822,595 14.2 30.9 1020
3 AL Autauga County 4,204 12,119 10,552 10,291 11.3 32.6 7080
1873 NY Bronx County 258,956 255,427 226,620 183,134 28 27.6 36005
1911 NY Queens County 303,881 454,105 369,271 518,999 18.5 27.6 36081
4 AL Baldwin County 14,310 40,579 46,025 46,075 9.7 27.6 1088
1901 NY New York County 162,237 155,048 171,461 758,325 13 12.4 36061
5 AL Barbour County 4,901 6,486 4,566 2,220 27.0 35.7 20012
1894 NY Kings County 326,469 455,299 3 47,052 648,461 18.4 25.6 36047
6 AL Bibb County 2,650 7,471 3,846 1,813 16.8 47.3 9012
我想總結列LessHSD
、 HSD
、 SomeCAD
的 5 個紐約市 burroughs (ZIP 36005,36047,36061,36081,36085) 數據,並用Area_name = New York Proper
用這些總和創建一個新行(見輸出以下)。
對於列P_LessHSD
和P_HSD
,我想按人口將這些變量加權到一個新行中。 我已經從另一組計算了自己的權重。 我想將里士滿縣乘以0.05669632
,布朗克斯縣乘以0.17051732
,皇后區乘以0.27133878
,紐約縣乘以0.19392188
,國王乘以0.3075256
。
顯然,對於列 P_LessHSD,這看起來像:
11.5*0.05669632
+ 28*0.17051732
+ 18.5*0.27133878
+ 13*0.19392188
+ 18.4*0.3075256
給出 18.6(四舍五入到十位時)。 這也適用於 P_HSD。 我希望新行的 ZIP 為 55555。我還想刪除 Burroughs 的所有 5 行。
輸出應該是:
State Area_name LessHSD HSD SomeCAD BDorMore P_LessHSD P_HSD ZIP
1 US United States 26,948,057 59,265,308 63,365,655 68,867,051 12.3 27.1 1009
2 AL Alabama 470,043 1,020,172 987,148 822,595 14.2 30.9 1020
3 AL Autauga County 4,204 12,119 10,552 10,291 11.3 32.6 7080
4 AL Baldwin County 14,310 40,579 46,025 46,075 9.7 27.6 1088
5 AL Barbour County 4,901 6,486 4,566 2,220 27.0 35.7 20012
6 AL Bibb County 2,650 7,471 3,846 1,813 16.8 47.3 9012
7 NY New York Proper 1089218 1421617 895418 2217245 18.6 24.2 55555
可能有幫助。
它使用dplyr
包。 你需要先安裝它
install.packages("dplyr")
library(dplyr)
DF %>%
filter(!(ZIP %in% c(36005,36047,36061,36081,36085))) %>%
bind_rows(
DF %>%
filter(ZIP %in% c(36005,36047,36061,36081,36085)) %>%
mutate(wg = case_when(Area_name == "Richmond County" ~ 0.05669632,
Area_name == "Bronx County" ~ 0.17051732,
Area_name == "Queens County" ~ 0.27133878,
Area_name == "New York County" ~ 0.19392188,
Area_name == "Kings County" ~ 0.3075256,
TRUE ~ 0),
P_LessHSD = wg*P_LessHSD,
P_HSD = wg*P_HSD,
Area_name = "New York Proper") %>%
group_by(State, Area_name) %>%
summarize_at(vars(LessHSD:P_HSD), sum) %>%
mutate(ZIP = 55555) )
# # A tibble: 7 x 9
# State Area_name LessHSD HSD SomeCAD BDorMore P_LessHSD P_HSD ZIP
# <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 US United States 26948057 59265308 63365655 68867051 12.3 27.1 1009
# 2 AL Alabama 470043 1020172 987148 822595 14.2 30.9 1020
# 3 AL Autauga County 4204 12119 10552 10291 11.3 32.6 7080
# 4 AL Baldwin County 14310 40579 46025 46075 9.7 27.6 1088
# 5 AL Barbour County 4901 6486 4566 2220 27 35.7 20012
# 6 AL Bibb County 2650 7471 3846 1813 16.8 47.3 9012
# 7 NY New York Proper 1089218 1421617 1195418 2217245 18.6 24.2 55555
附注。 它為someCAD
提供了不同的結果。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.