[英]Returning more than one variable using Group By and summarize with Dplyr
I'm trying to create a new column in my 2016 election dataset that shows whether the candidate lost or won a county. 我正在尝试在我的2016年选举数据集中创建一个新列,以显示候选人是否输了县或赢得了县。
Democrat %>%
group_by(county) %>%
summarise(winningvote = max(fraction_votes))
This code only returns the max vote. 此代码仅返回最大投票。 Can I also return the candidate variable?
我还可以返回候选变量吗? Adding:
新增:
select(county, fraction_votes, candidate)
Doesn't return anything different. 没有返回任何不同的东西。
I'll attempt to create an "outcome" variable using mutate for the last line of the code. 我将尝试使用mutate为代码的最后一行创建一个“结果”变量。 I was thinking the apply family might be another way to solve this.
我以为申请家庭可能是解决此问题的另一种方式。
Thanks 谢谢
If the candidate
is a field of the Democrat
data frame, the simplest way is to do multiple grouping: 如果
candidate
是Democrat
数据框的一个字段,则最简单的方法是进行多个分组:
Democrat %>%
group_by(county, candidate) %>%
summarise(winningvote = max(fraction_votes))
I'm pretty confident there's a more succinct way to do this, but below will provide you a winning vote flag as 1. Then you simply replace NA with 0 (second block of code) 我非常有信心这样做的方法更加简洁,但是下面将为您提供一个获胜的投票标志:1。然后您只需将NA替换为0(第二个代码块)
left_join(Democrat, (Democrat %>%
group_by(county) %>%
summarise(fraction_votes = max(fraction_votes)) %>%
mutate(Winning_Vote = 1)))
Democrat[is.na(Democrat)] <- 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.