简体   繁体   English

在 R 中减去组内的值

[英]Subtract values within groups in R

I have a dataset containing variables that give information about the voteshare of a party in a given year and district and whether or not the respective party sent a candidate to parliament, like this:我有一个包含变量的数据集,这些变量提供有关特定yeardistrict某个partyvoteshare以及相应政党是否向议会发送candidate ,如下所示:

year district party voteshare candidate
2000 A        P1    50%       1
2000 A        P2    30%       0
2000 A        P3    20%       0
2000 B        P1    43%       1
2000 B        P2    21%       0
2000 B        P3    34%       0
...

Now, I want to calcuate each party's margin of loss/victory (ie how "close" the election was for the respective party) by substracting each party's voteshare from the winning party (the party that sent a candidate to parliament) and the winning party's voteshare from the second successful party, such that:现在,我想通过从获胜党(将候选人送入议会的党)和获胜党的投票份额中减去每一党的投票份额来计算每一党的失败/胜利幅度(即各自政党的选举有多“接近”)来自第二个成功政党的投票份额,使得:

year district party voteshare candidate margin
2000 A        P1    50%       1         +20%
2000 A        P2    30%       0         -20%
2000 A        P3    20%       0         -30%
2000 B        P1    43%       1         +9%
2000 B        P2    21%       0         -22%
2000 B        P3    34%       0         -9%
...

I don't know how to do that with dplyr...我不知道如何用 dplyr 做到这一点...

You can do :你可以做 :

library(dplyr)

df1 %>%
  #Turn voteshare to a number
  mutate(voteshare = readr::parse_number(voteshare)) %>%
  group_by(year, district) %>%
  #When candidate is sent to parliament
  mutate(margin = case_when(candidate == 1 ~ 
                            #Subtract with second highest voteshare
                            voteshare - sort(voteshare, decreasing = TRUE)[2],
                            #else subtract with voteshare of highest candidate
                            TRUE ~ voteshare - voteshare[candidate == 1]))

#   year district party voteshare candidate margin
#  <int> <chr>    <chr>     <dbl>     <int>  <dbl>
#1  2000 A        P1           50         1     20
#2  2000 A        P2           30         0    -20
#3  2000 A        P3           20         0    -30
#4  2000 B        P1           43         1      9
#5  2000 B        P2           21         0    -22
#6  2000 B        P3           34         0     -9

data数据

df1 <- structure(list(year = c(2000L, 2000L, 2000L, 2000L, 2000L, 2000L
), district = c("A", "A", "A", "B", "B", "B"), party = c("P1", 
"P2", "P3", "P1", "P2", "P3"), voteshare = c("50%", "30%", "20%", 
"43%", "21%", "34%"), candidate = c(1L, 0L, 0L, 1L, 0L, 0L)), 
class = "data.frame", row.names = c(NA, -6L))

Here is a solution using the data.table package:这是使用data.table包的解决方案:

library(data.table)

df1 <- structure(list(year = c(2000L, 2000L, 2000L, 2000L, 2000L, 2000L
), district = c("A", "A", "A", "B", "B", "B"), party = c("P1", 
"P2", "P3", "P1", "P2", "P3"), voteshare = c("50%", "30%", "20%", 
"43%", "21%", "34%"), candidate = c(1L, 0L, 0L, 1L, 0L, 0L)), 
class = "data.frame", row.names = c(NA, -6L))

setDT(df1)

df1[, margin := as.numeric(gsub("%", "", voteshare))][
    , margin := fcase(candidate == 1, diff(tail(sort(margin), 2)),
                      candidate == 0, margin - max(margin)),
    by=.(district)][
    , margin := fcase(margin < 0, sprintf("%s%%", margin),
                      margin > 0, sprintf("+%s%%", margin),
                      margin == 0, "0%")]



df1
#>    year district party voteshare candidate margin
#> 1: 2000        A    P1       50%         1   +20%
#> 2: 2000        A    P2       30%         0   -20%
#> 3: 2000        A    P3       20%         0   -30%
#> 4: 2000        B    P1       43%         1    +9%
#> 5: 2000        B    P2       21%         0   -22%
#> 6: 2000        B    P3       34%         0    -9%

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM