简体   繁体   中英

Transpose or gather wide to long dataframe with multiple keys and values

I'm trying to transpose a wide dataset to a long tidy one. I use the tidyr::gather() function alot for these kind of tasks, only now I have a pretty weird dataset.

The following is a small version of mine. As you can imagine that the columns with __1 behind them get repeated till number __16 or something in my real datframe. Is this possible to fix with tidyr or dplyr tools?

# A tibble: 1 x 10
   code city      party_short party_long           votes seats party_short__1 party_long__1    votes__1 seats__1
  <dbl> <chr>     <chr>       <chr>                <dbl> <dbl> <chr>          <chr>               <dbl>    <dbl>
1  3630 Amsterdam PVDA        Partij van de Arbeid  1833  5.00 HARLBEL        Harlinger Belang       942     2.00 

And for reproducability:

library(tidyverse)

df <- tibble(code = 3630,
        city = "Amsterdam",
        party_short = "PVDA",
        party_long = "Partij van de Arbeid",
        votes = 1833,
        seats = 5,
        party_short__1 = "HARLBEL",
        party_long__1 = "Harlinger Belang",
        votes__1 = 942,
        seats__1 = 2)

With a desired output:

# A tibble: 2 x 6
   code city      party_short party_long           votes seats
  <dbl> <chr>     <chr>       <chr>                <dbl> <dbl>
1  3630 Amsterdam PVDA        Partij van de Arbeid  1833  5.00
2  3630 Amsterdam HARLBEL     Harlinger Belang       942  2.00

We can gather all columns, separate the column based on "__", and then spread the data frame.

library(tidyverse)

df2 <- df %>%
  gather(Column, Value, -code, -city) %>%
  separate(Column, into = c("Column", "Number"), sep = "__") %>%
  spread(Column, Value) %>%
  select(-Number)
df2
# # A tibble: 2 x 6
#   code city      party_long           party_short seats votes
#   <dbl> <chr>     <chr>                <chr>       <chr> <chr>
# 1 3630. Amsterdam Harlinger Belang     HARLBEL     2     942  
# 2 3630. Amsterdam Partij van de Arbeid PVDA        5     1833

I am using a combination of data.table and tidyr below

library(data.table)
library(tidyr)

setDT(df)

melt(df, id.vars = c('code', 'city')) %>% separate(variable, c('vv', 'bb'), '__') %>%
  dcast(code + city + bb ~ vv, value.var = 'value') %>% mutate(bb = NULL)

  code      city           party_long party_short seats votes
1 3630 Amsterdam     Harlinger Belang     HARLBEL     2   942
2 3630 Amsterdam Partij van de Arbeid        PVDA     5  1833

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM