简体   繁体   中英

Reshape table using tidyr

I have a big table that I am trying to reshape using tidyr and its in a long format and I want to change into wide format. The table is large and this is proving more complicated than I thought.

The table looks like this

Codes      areas  var1  var2  var3
1111       1010    2      2    34
1112       1010    3      7    18
1113       1010    20     12   11
1114       1010    19     11   22
[...]      [...]   [...]  [...]  [...]
1111       1020    14     19   12
1112       1020    10     10   13

The goal would be to obtain one row per area with the variables in wide format.

Like:

Area  1111Var1 1111Var2 111Var3 1112Var1 1112Var2 1112Var3
1010    2         2        34      3       7        18

So far I have tried spread and mutate in tidyr but without much success.

You'll need three tidyr steps here:

d %>%
  gather(key, value, -Codes, -areas) %>%
  unite(combined, Codes, key, sep = "") %>%
  spread(combined, value)

Where d is your data.


To explain the steps:

library(tidyr)

# setting up data
d <- readr::read_delim("Codes areas var1 var2 var3
1111 1010 2 2 34
1112 1010 3 7 18
1113 1010 20 12 11
1114 1010 19 11 22
1111 1020 14 19 12
1112 1020 10 10 13", delim = " ")

First you need to gather the var1, var2, var3 columns:

d %>%
  gather(key, value, -Codes, -areas)
#> Source: local data frame [18 x 4]
#> 
#>    Codes areas    key value
#>    (int) (int) (fctr) (int)
#> 1   1111  1010   var1     2
#> 2   1112  1010   var1     3
#> 3   1113  1010   var1    20
#> 4   1114  1010   var1    19
#> 5   1111  1020   var1    14
#> 6   1112  1020   var1    10
#> 7   1111  1010   var2     2
#> 8   1112  1010   var2     7
#> 9   1113  1010   var2    12
#> 10  1114  1010   var2    11
#> 11  1111  1020   var2    19
#> 12  1112  1020   var2    10
#> 13  1111  1010   var3    34
#> 14  1112  1010   var3    18
#> 15  1113  1010   var3    11
#> 16  1114  1010   var3    22
#> 17  1111  1020   var3    12
#> 18  1112  1020   var3    13

Then combine them with the Codes column using tidyr's unite :

d %>%
  gather(key, value, -Codes, -areas) %>%
  unite(combined, Codes, key, sep = "")
#> Source: local data frame [18 x 3]
#> 
#>    combined areas value
#>       (chr) (int) (int)
#> 1  1111var1  1010     2
#> 2  1112var1  1010     3
#> 3  1113var1  1010    20
#> 4  1114var1  1010    19
#> 5  1111var1  1020    14
#> 6  1112var1  1020    10
#> 7  1111var2  1010     2
#> 8  1112var2  1010     7
#> 9  1113var2  1010    12
#> 10 1114var2  1010    11
#> 11 1111var2  1020    19
#> 12 1112var2  1020    10
#> 13 1111var3  1010    34
#> 14 1112var3  1010    18
#> 15 1113var3  1010    11
#> 16 1114var3  1010    22
#> 17 1111var3  1020    12
#> 18 1112var3  1020    13

Now spread will work:

d %>%
  gather(key, value, -Codes, -areas) %>%
  unite(combined, Codes, key, sep = "") %>%
  spread(combined, value)
#> Source: local data frame [2 x 13]
#> 
#>   areas 1111var1 1111var2 1111var3 1112var1 1112var2 1112var3 1113var1
#>   (int)    (int)    (int)    (int)    (int)    (int)    (int)    (int)
#> 1  1010        2        2       34        3        7       18       20
#> 2  1020       14       19       12       10       10       13       NA
#> Variables not shown: 1113var2 (int), 1113var3 (int), 1114var1 (int),
#>   1114var2 (int), 1114var3 (int)

I was able to do this in the following way but it may not be the best/most efficient

df <- read.table(header = TRUE, stringsAsFactors = FALSE, text = '
Codes      areas  var1  var2  var3
1111       1010    2      2    34
1112       1010    3      7    18
1113       1010    20     12   11
1114       1010    19     11   22
1111       1020    14     19   12
1112       1020    10     10   13')

df_new <-
  df %>%
  gather(var_type, var_value, -areas, -Codes) %>%
  mutate(var_code = paste(Codes, var_type, sep = '_')) %>%
  select(-Codes, -var_type) %>%
  spread(var_code, var_value)

df_new

#  areas 1111_var1 1111_var2 1111_var3 1112_var1 1112_var2 1112_var3 1113_var1 1113_var2 1113_var3 1114_var1 1114_var2 1114_var3
#1  1010         2         2        34         3         7        18        20        12        11        19        11        22
#2  1020        14        19        12        10        10        13        NA        NA        NA        NA        NA        NA

I hope this helps.

EDIT

Here is the version of the above solution using unite instead, as used in the @David Robinson Answer.

df %>%
  gather(var_type, var_value, -areas, -Codes) %>%
  unite(NewCode, Codes, var_type, sep = '') %>%
  spread(NewCode, var_value)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM