繁体   English   中英

跨行汇总数据框

[英]Data frame aggregating across rows

使用如下数据框df

text <- "
State,District,County,Num Voters,Total Votes in State,Votes for None,Candidate Name,Party,Votes Scored
CA,San Diego,Delmar,190962,48026634,2511,A1,IND,949
CA,San Diego,Delmar,190962,48026634,2511,A2,RP(K),44815
CA,San Diego,Delmar,190962,48026634,2511,A3,IND,1036
CA,San Diego,Delmar,190962,48026634,2511,A4,DEM,29235
CA,San Diego,Delmar,190962,48026634,2511,A5,IND,5064
CA,San Diego,Delmar,190962,48026634,2511,A6,IND,803
CA,San Diego,Delmar,190962,48026634,2511,A7,REP,22329
CA,San Diego,Delmar,190962,48026634,2511,A8,BSP,43553
CA,San Diego,La Jolla,190257,48026634,3629,A1,IND,972
CA,San Diego,La Jolla,190257,48026634,3629,A2,RP(K),66168
CA,San Diego,La Jolla,190257,48026634,3629,A3,IND,2763
CA,San Diego,La Jolla,190257,48026634,3629,A4,DEM,32792
CA,San Diego,La Jolla,190257,48026634,3629,A5,IND,8629
CA,San Diego,La Jolla,190257,48026634,3629,A6,IND,1191
CA,San Diego,La Jolla,190257,48026634,3629,A7,REP,28002
CA,San Diego,La Jolla,190257,48026634,3629,A8,BSP,2555
"
df <- read.table(textConnection(text), sep = ",", header = TRUE)

我的数据包含五个政党:IND,RP(K),DEM,REP和BSP。 我想创建两个新的得分列:

  • DRP:DEM得分+ RP(K)得分
  • RSP:REP得分+ BSP得分

另外,我想添加一些列,这些列在县和县一级对这些分数进行分组。

我最好如何与dplyr一起使用。 我在考虑group功能,但是还没有弄清楚它的逻辑。

通过使用dplyr您可以执行以下操作。

tg <- df %>%
  group_by(County) %>%
  mutate(DRP_county = sum(Votes.Scored[Party == "RP(K)" | Party == "DEM"]),
         RSP_county = sum(Votes.Scored[Party == "REP" | Party == "BSP"])) %>%
  ungroup() %>% 
  group_by(District) %>%
  mutate(DRP_district = sum(Votes.Scored[Party == "RP(K)" | Party == "DEM"]),
         RSP_district = sum(Votes.Scored[Party == "REP" | Party == "BSP"]))

注意:我认为最好将所有内容都放在同一数据框中,但这当然取决于数据大小。 同样,为了将来对数据框的分析以及出于模型/可视化的目的,最好使用mutate而不是summarise ,尽管这样会带来更清晰的输出。

另外,您可能会跳过ungroup() ,但是我相信包含它会更安全。

使用dplyr ,如果您只需要两列,其中涉及方的地区和地区县的总和:

df %>%
  mutate(Party2 = ifelse(Party == "DEM" | Party == "RP(K)", "DRP", 
                         ifelse(Party == "REP" | Party == "BSP", "RSP", paste(Party)))) %>%
  group_by(District, Party2) %>%
  mutate(Votes.Scored.District = sum(Votes.Scored)) %>%
  ungroup() %>%
  group_by(County, Party2) %>%
  mutate(Votes.Scored.County = sum(Votes.Scored)) 

或者,如果您希望获得有关地区和县级政党的整体统计数据:

df %>%
  mutate(Party2 = ifelse(Party == "DEM" | Party == "RP(K)", "DRP", 
                         ifelse(Party == "REP" | Party == "BSP", "RSP", paste(Party)))) %>%
  group_by(District, Party2) %>%
  mutate(Votes.Scored.District = sum(Votes.Scored)) %>%
  ungroup() %>%
  group_by(County, Party2) %>%
  mutate(Votes.Scored.County = sum(Votes.Scored)) %>%
  group_by(Party2) %>%
  summarise(Votes.Scored.District = min(Votes.Scored.District),
            Votes.Scored.County = min(Votes.Scored.County))

# A tibble: 3 x 3
  Party2 Votes.Scored.District Votes.Scored.County
  <chr>                  <dbl>               <dbl>
1 DRP                  173010.              74050.
2 IND                   21407.               7852.
3 RSP                   96439.              30557.

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM