[英]How to sum and weight certain rows in a dataframe in R?
我目前有一个 data.frame 如下:
State Area_name LessHSD HSD SomeCAD BDorMore P_LessHSD P_HSD ZIP
1 US United States 26,948,057 59,265,308 63,365,655 68,867,051 12.3 27.1 1009
1913 NY Richmond County 37,675 101,738 81,014 108,326 11.5 30.9 36085
2 AL Alabama 470,043 1,020,172 987,148 822,595 14.2 30.9 1020
3 AL Autauga County 4,204 12,119 10,552 10,291 11.3 32.6 7080
1873 NY Bronx County 258,956 255,427 226,620 183,134 28 27.6 36005
1911 NY Queens County 303,881 454,105 369,271 518,999 18.5 27.6 36081
4 AL Baldwin County 14,310 40,579 46,025 46,075 9.7 27.6 1088
1901 NY New York County 162,237 155,048 171,461 758,325 13 12.4 36061
5 AL Barbour County 4,901 6,486 4,566 2,220 27.0 35.7 20012
1894 NY Kings County 326,469 455,299 3 47,052 648,461 18.4 25.6 36047
6 AL Bibb County 2,650 7,471 3,846 1,813 16.8 47.3 9012
我想总结列LessHSD
、 HSD
、 SomeCAD
的 5 个纽约市 burroughs (ZIP 36005,36047,36061,36081,36085) 数据,并用Area_name = New York Proper
用这些总和创建一个新行(见输出以下)。
对于列P_LessHSD
和P_HSD
,我想按人口将这些变量加权到一个新行中。 我已经从另一组计算了自己的权重。 我想将里士满县乘以0.05669632
,布朗克斯县乘以0.17051732
,皇后区乘以0.27133878
,纽约县乘以0.19392188
,国王乘以0.3075256
。
显然,对于列 P_LessHSD,这看起来像:
11.5*0.05669632
+ 28*0.17051732
+ 18.5*0.27133878
+ 13*0.19392188
+ 18.4*0.3075256
给出 18.6(四舍五入到十位时)。 这也适用于 P_HSD。 我希望新行的 ZIP 为 55555。我还想删除 Burroughs 的所有 5 行。
输出应该是:
State Area_name LessHSD HSD SomeCAD BDorMore P_LessHSD P_HSD ZIP
1 US United States 26,948,057 59,265,308 63,365,655 68,867,051 12.3 27.1 1009
2 AL Alabama 470,043 1,020,172 987,148 822,595 14.2 30.9 1020
3 AL Autauga County 4,204 12,119 10,552 10,291 11.3 32.6 7080
4 AL Baldwin County 14,310 40,579 46,025 46,075 9.7 27.6 1088
5 AL Barbour County 4,901 6,486 4,566 2,220 27.0 35.7 20012
6 AL Bibb County 2,650 7,471 3,846 1,813 16.8 47.3 9012
7 NY New York Proper 1089218 1421617 895418 2217245 18.6 24.2 55555
可能有帮助。
它使用dplyr
包。 你需要先安装它
install.packages("dplyr")
library(dplyr)
DF %>%
filter(!(ZIP %in% c(36005,36047,36061,36081,36085))) %>%
bind_rows(
DF %>%
filter(ZIP %in% c(36005,36047,36061,36081,36085)) %>%
mutate(wg = case_when(Area_name == "Richmond County" ~ 0.05669632,
Area_name == "Bronx County" ~ 0.17051732,
Area_name == "Queens County" ~ 0.27133878,
Area_name == "New York County" ~ 0.19392188,
Area_name == "Kings County" ~ 0.3075256,
TRUE ~ 0),
P_LessHSD = wg*P_LessHSD,
P_HSD = wg*P_HSD,
Area_name = "New York Proper") %>%
group_by(State, Area_name) %>%
summarize_at(vars(LessHSD:P_HSD), sum) %>%
mutate(ZIP = 55555) )
# # A tibble: 7 x 9
# State Area_name LessHSD HSD SomeCAD BDorMore P_LessHSD P_HSD ZIP
# <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 US United States 26948057 59265308 63365655 68867051 12.3 27.1 1009
# 2 AL Alabama 470043 1020172 987148 822595 14.2 30.9 1020
# 3 AL Autauga County 4204 12119 10552 10291 11.3 32.6 7080
# 4 AL Baldwin County 14310 40579 46025 46075 9.7 27.6 1088
# 5 AL Barbour County 4901 6486 4566 2220 27 35.7 20012
# 6 AL Bibb County 2650 7471 3846 1813 16.8 47.3 9012
# 7 NY New York Proper 1089218 1421617 1195418 2217245 18.6 24.2 55555
附注。 它为someCAD
提供了不同的结果。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.