简体   繁体   English

R 中的加权连接或匹配

[英]weighted join or match in R

I am working with election data from California's Statewide Database ( https://statewidedatabase.org/election.html ).我正在处理来自加州全州数据库 ( https://statewidedatabase.org/election.html ) 的选举数据。 I am trying to convert their precinct level election results to 2010 census block level results.我正在尝试将他们的选区级别选举结果转换为 2010 年人口普查区块级别的结果。 I have the precinct level election results我有选区级别的选举结果

> sov_results
# A tibble: 20,744 x 136
   COUNTY FIPS  SRPREC_KEY SRPREC ADDIST CDDIST SDDIST BEDIST TOTREG DEMREG REPREG AIPREG GRNREG LIBREG NLPREG REFREG DCLREG MSCREG TOTVOTE
    <dbl> <chr> <chr>       <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>   <dbl>
 1     49 06097 060971002    1002      2      5      2      2     29      0      0      0      0      0      0      0      0      0      18
 2     49 06097 060971003    1003      2      2      2      2      1      0      0      0      0      0      0      0      0      0       0
 3     49 06097 060971005    1005      2      2      2      2    106      0      0      0      0      0      0      0      0      0      67
 4     49 06097 060971006    1006      2      5      2      2      2      0      0      0      0      0      0      0      0      0       2
 5     49 06097 060971007    1007      2      2      2      2     56      0      0      0      0      0      0      0      0      0      42
 6     49 06097 060971008    1008      2      5      2      2    148      0      0      0      0      0      0      0      0      0     109
 7     49 06097 060971009    1009      2      5      2      2    137      0      0      0      0      0      0      0      0      0      97
 8     49 06097 060971012    1012      2      5      2      2     21      0      0      0      0      0      0      0      0      0      16
 9     49 06097 060971017    1017      4      5      2      2    723      0      0      0      0      0      0      0      0      0     591
10     49 06097 060971018    1018      2      2      2      2     14      0      0      0      0      0      0      0      0      0      10
# ... with 20,734 more rows, and 117 more variables: DEMVOTE <dbl>, REPVOTE <dbl>, AIPVOTE <dbl>, GRNVOTE <dbl>, LIBVOTE <dbl>,
#   NLPVOTE <dbl>, REFVOTE <dbl>, DCLVOTE <dbl>, MSCVOTE <dbl>, PRCVOTE <dbl>, ABSVOTE <dbl>, ASSDEM01 <dbl>, ASSDEM02 <dbl>,
#   ASSDEM03 <dbl>, ASSDEM04 <dbl>, ASSDEM05 <dbl>, ASSDEM06 <dbl>, ASSDEM07 <dbl>, ASSDEM08 <dbl>, ASSGRN01 <dbl>, ASSIND01 <dbl>,
#   ASSLIB01 <dbl>, ASSPAF01 <dbl>, ASSREP01 <dbl>, ASSREP02 <dbl>, ASSREP03 <dbl>, ASSREP04 <dbl>, CNGAIP01 <dbl>, CNGDEM01 <dbl>,
#   CNGDEM02 <dbl>, CNGDEM03 <dbl>, CNGDEM04 <dbl>, CNGDEM05 <dbl>, CNGDEM06 <dbl>, CNGDEM07 <dbl>, CNGDEM08 <dbl>, CNGDEM09 <dbl>,

As well as the conversion key with the weights.以及带有权重的转换键。

> conversion
# A tibble: 398,299 x 13
   SRPREC FIPS  ELECTION TYPE   SRPREC_KEY BLOCK_KEY        TRACT BLOCK BLKREG SRTOTREG PCTSRPREC BLKTOTREG PCTBLK
    <dbl> <chr> <chr>    <chr>  <chr>      <chr>            <dbl> <dbl>  <dbl>    <dbl>     <dbl>     <dbl>  <dbl>
 1     NA 06097 p20      sr_blk 06097nan   060970000000000      0     0      1       NA     NA            1  100  
 2   1002 06097 p20      sr_blk 060971002  060971525011014 152501  1014     26       29     89.7         26  100  
 3   1002 06097 p20      sr_blk 060971002  060971525013008 152501  3008      3       29     10.3          3  100  
 4   1003 06097 p20      sr_blk 060971003  060971526005068 152600  5068      1        1    100            1  100  
 5   1005 06097 p20      sr_blk 060971005  060971526005000 152600  5000     14      106     13.2         43   32.6
 6   1005 06097 p20      sr_blk 060971005  060971526005003 152600  5003     12      106     11.3         12  100  
 7   1005 06097 p20      sr_blk 060971005  060971526005004 152600  5004     12      106     11.3         20   60  
 8   1005 06097 p20      sr_blk 060971005  060971526005006 152600  5006      5      106      4.72         5  100  
 9   1005 06097 p20      sr_blk 060971005  060971526005008 152600  5008     24      106     22.6         24  100  
10   1005 06097 p20      sr_blk 060971005  060971526005020 152600  5020     28      106     26.4         28  100 

I want to know how to match these precinct results to the census block in such a way that the census block is given the right amount of votes from the precinct results (based on the PCTSRPREC column which indicates what percentage of of the precinct belongs in the census block).我想知道如何将这些选区结果与人口普查区相匹配,以便普查区从选区结果中获得适当数量的选票(基于 PCTSRPREC 列,该列指示该选区的百分比属于人口普查区)。

For example, I would want to join so that 13.2% of SRPREC_KEY 060971005 is assigned to BLOCK 5000. That would be 13.2% of the TOTVOTE (rounded to a whole number), 13.2% of DEMVOTE, 13.2% of ASSDEM03 vote, etc. Is there a function or way to do this in R?例如,我想加入,以便将 SRPREC_KEY 060971005 的 13.2% 分配给 BLOCK 5000。这将是 TOTVOTE 的 13.2%(四舍五入)、DEMVOTE 的 13.2%、ASSDEM03 投票的 13.2%,等等。在 R 中是否有 function 或方法可以做到这一点?

I think you're looking for a join/merge operation and then a simple multiply.我认为您正在寻找加入/合并操作,然后是简单的乘法。

library(dplyr)
select(conversion, SRPREC_KEY, BLOCK, PCTSRPREC) %>%
  left_join(., sov_results, by = "SRPREC_KEY") %>%
  mutate(across(TOTREG:TOTVOTE, ~ . * PCTSRPREC / 100))
#    SRPREC_KEY BLOCK PCTSRPREC COUNTY FIPS SRPREC ADDIST CDDIST SDDIST BEDIST  TOTREG DEMREG REPREG AIPREG GRNREG LIBREG NLPREG REFREG DCLREG MSCREG TOTVOTE
# 1    06097nan     0        NA     NA   NA     NA     NA     NA     NA     NA      NA     NA     NA     NA     NA     NA     NA     NA     NA     NA      NA
# 2   060971002  1014     89.70     49 6097   1002      2      5      2      2 26.0130      0      0      0      0      0      0      0      0      0 16.1460
# 3   060971002  3008     10.30     49 6097   1002      2      5      2      2  2.9870      0      0      0      0      0      0      0      0      0  1.8540
# 4   060971003  5068    100.00     49 6097   1003      2      2      2      2  1.0000      0      0      0      0      0      0      0      0      0  0.0000
# 5   060971005  5000     13.20     49 6097   1005      2      2      2      2 13.9920      0      0      0      0      0      0      0      0      0  8.8440
# 6   060971005  5003     11.30     49 6097   1005      2      2      2      2 11.9780      0      0      0      0      0      0      0      0      0  7.5710
# 7   060971005  5004     11.30     49 6097   1005      2      2      2      2 11.9780      0      0      0      0      0      0      0      0      0  7.5710
# 8   060971005  5006      4.72     49 6097   1005      2      2      2      2  5.0032      0      0      0      0      0      0      0      0      0  3.1624
# 9   060971005  5008     22.60     49 6097   1005      2      2      2      2 23.9560      0      0      0      0      0      0      0      0      0 15.1420
# 10  060971005  5020     26.40     49 6097   1005      2      2      2      2 27.9840      0      0      0      0      0      0      0      0      0 17.6880

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM