简体   繁体   中英

Reshape table using tidyr

I have a big table that I am trying to reshape using tidyr and its in a long format and I want to change into wide format. The table is large and this is proving more complicated than I thought.

The table looks like this

Codes      areas  var1  var2  var3
1111       1010    2      2    34
1112       1010    3      7    18
1113       1010    20     12   11
1114       1010    19     11   22
[...]      [...]   [...]  [...]  [...]
1111       1020    14     19   12
1112       1020    10     10   13

The goal would be to obtain one row per area with the variables in wide format.


Area  1111Var1 1111Var2 111Var3 1112Var1 1112Var2 1112Var3
1010    2         2        34      3       7        18

So far I have tried spread and mutate in tidyr but without much success.

You'll need three tidyr steps here:

d %>%
  gather(key, value, -Codes, -areas) %>%
  unite(combined, Codes, key, sep = "") %>%
  spread(combined, value)

Where d is your data.

To explain the steps:


# setting up data
d <- readr::read_delim("Codes areas var1 var2 var3
1111 1010 2 2 34
1112 1010 3 7 18
1113 1010 20 12 11
1114 1010 19 11 22
1111 1020 14 19 12
1112 1020 10 10 13", delim = " ")

First you need to gather the var1, var2, var3 columns:

d %>%
  gather(key, value, -Codes, -areas)
#> Source: local data frame [18 x 4]
#>    Codes areas    key value
#>    (int) (int) (fctr) (int)
#> 1   1111  1010   var1     2
#> 2   1112  1010   var1     3
#> 3   1113  1010   var1    20
#> 4   1114  1010   var1    19
#> 5   1111  1020   var1    14
#> 6   1112  1020   var1    10
#> 7   1111  1010   var2     2
#> 8   1112  1010   var2     7
#> 9   1113  1010   var2    12
#> 10  1114  1010   var2    11
#> 11  1111  1020   var2    19
#> 12  1112  1020   var2    10
#> 13  1111  1010   var3    34
#> 14  1112  1010   var3    18
#> 15  1113  1010   var3    11
#> 16  1114  1010   var3    22
#> 17  1111  1020   var3    12
#> 18  1112  1020   var3    13

Then combine them with the Codes column using tidyr's unite :

d %>%
  gather(key, value, -Codes, -areas) %>%
  unite(combined, Codes, key, sep = "")
#> Source: local data frame [18 x 3]
#>    combined areas value
#>       (chr) (int) (int)
#> 1  1111var1  1010     2
#> 2  1112var1  1010     3
#> 3  1113var1  1010    20
#> 4  1114var1  1010    19
#> 5  1111var1  1020    14
#> 6  1112var1  1020    10
#> 7  1111var2  1010     2
#> 8  1112var2  1010     7
#> 9  1113var2  1010    12
#> 10 1114var2  1010    11
#> 11 1111var2  1020    19
#> 12 1112var2  1020    10
#> 13 1111var3  1010    34
#> 14 1112var3  1010    18
#> 15 1113var3  1010    11
#> 16 1114var3  1010    22
#> 17 1111var3  1020    12
#> 18 1112var3  1020    13

Now spread will work:

d %>%
  gather(key, value, -Codes, -areas) %>%
  unite(combined, Codes, key, sep = "") %>%
  spread(combined, value)
#> Source: local data frame [2 x 13]
#>   areas 1111var1 1111var2 1111var3 1112var1 1112var2 1112var3 1113var1
#>   (int)    (int)    (int)    (int)    (int)    (int)    (int)    (int)
#> 1  1010        2        2       34        3        7       18       20
#> 2  1020       14       19       12       10       10       13       NA
#> Variables not shown: 1113var2 (int), 1113var3 (int), 1114var1 (int),
#>   1114var2 (int), 1114var3 (int)

I was able to do this in the following way but it may not be the best/most efficient

df <- read.table(header = TRUE, stringsAsFactors = FALSE, text = '
Codes      areas  var1  var2  var3
1111       1010    2      2    34
1112       1010    3      7    18
1113       1010    20     12   11
1114       1010    19     11   22
1111       1020    14     19   12
1112       1020    10     10   13')

df_new <-
  df %>%
  gather(var_type, var_value, -areas, -Codes) %>%
  mutate(var_code = paste(Codes, var_type, sep = '_')) %>%
  select(-Codes, -var_type) %>%
  spread(var_code, var_value)


#  areas 1111_var1 1111_var2 1111_var3 1112_var1 1112_var2 1112_var3 1113_var1 1113_var2 1113_var3 1114_var1 1114_var2 1114_var3
#1  1010         2         2        34         3         7        18        20        12        11        19        11        22
#2  1020        14        19        12        10        10        13        NA        NA        NA        NA        NA        NA

I hope this helps.


Here is the version of the above solution using unite instead, as used in the @David Robinson Answer.

df %>%
  gather(var_type, var_value, -areas, -Codes) %>%
  unite(NewCode, Codes, var_type, sep = '') %>%
  spread(NewCode, var_value)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM