简体   繁体   English

使用Group By返回多个变量并使用Dplyr进行汇总

[英]Returning more than one variable using Group By and summarize with Dplyr

I'm trying to create a new column in my 2016 election dataset that shows whether the candidate lost or won a county. 我正在尝试在我的2016年选举数据集中创建一个新列,以显示候选人是否输了县或赢得了县。

 Democrat %>%
  group_by(county) %>%
  summarise(winningvote = max(fraction_votes))

This code only returns the max vote. 此代码仅返回最大投票。 Can I also return the candidate variable? 我还可以返回候选变量吗? Adding: 新增:

 select(county, fraction_votes, candidate)

Doesn't return anything different. 没有返回任何不同的东西。

I'll attempt to create an "outcome" variable using mutate for the last line of the code. 我将尝试使用mutate为代码的最后一行创建一个“结果”变量。 I was thinking the apply family might be another way to solve this. 我以为申请家庭可能是解决此问题的另一种方式。

Thanks 谢谢

If the candidate is a field of the Democrat data frame, the simplest way is to do multiple grouping: 如果candidateDemocrat数据框的一个字段,则最简单的方法是进行多个分组:

Democrat %>%
  group_by(county, candidate) %>%
  summarise(winningvote = max(fraction_votes))

I'm pretty confident there's a more succinct way to do this, but below will provide you a winning vote flag as 1. Then you simply replace NA with 0 (second block of code) 我非常有信心这样做的方法更加简洁,但是下面将为您提供一个获胜的投票标志:1。然后您只需将NA替换为0(第二个代码块)

left_join(Democrat, (Democrat %>%
  group_by(county) %>%
  summarise(fraction_votes = max(fraction_votes)) %>%
  mutate(Winning_Vote = 1)))

Democrat[is.na(Democrat)] <- 0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM