简体   繁体   中英

R: Creating a wide data.frame from a long data.frame containing redundancies, without creating a list

The ICPSR database is a massive ASCII-coded dataset of US elections that was created before modern coding standards were put in place. I have written a script that extracts all the data into a data.frame in long format with columns for county, vote total, and "party-year" (eg, Democrat_1992, Republican_1992, Reform_1992, etc.).

The problem is that these data were created in a patchwork by multiple authors over multiple years, so there are numerous duplicates and inefficiencies. For example, in the Arizona returns, you will encounter the following:

county votes header
PIMA 0 DEMOCRAT_1944
PIMA 13,006 DEMOCRAT_1944
MARICOPA 32,197 DEMOCRAT_1944
MARICOPA 0 DEMOCRAT_1944
PIMA 3,392 REPUBLICAN_1944
MARICOPA 24,853 REPUBLICAN_1944

The problem is that when you shift this to wide format, R will create a list for the column "DEMOCRAT_1944" where, for example, the Maricopa entry would be c(32197, 0). Making it worse, this is inconsistent; most data are entered correctly (eg, the data for REPUBLICAN_1944 only appear once, and so those data convert to wide nicely).

I am at a bit of a loss on how to fix this. Obviously it would be easy in this table to do it by brute force, but we're talking about 503,371 observations in the overall data.frame. It isn't consistent which party or year is redundant, so any solution would have to be very general. Also, some counties will have "legitimate" zeroes in them, so simply eliminating those rows containing zero can't be the solution.

I used the following code to convert from long to wide:

state_df2 <- state_df %>%
  pivot_wider(names_from = new_header, values_from = value)

You could do:

state_df %>%
  mutate(votes = as.numeric(str_remove(votes, ','))) %>%
  pivot_wider(names_from = header, values_from = votes, values_fn = sum)

 county   DEMOCRAT_1944 REPUBLICAN_1944
  <chr>            <dbl>           <dbl>
1 PIMA             13006            3392
2 MARICOPA         32197           24853

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM