I have a big table that I am trying to reshape using tidyr and its in a long format and I want to change into wide format. The table is large and this is proving more complicated than I thought.
The table looks like this
Codes areas var1 var2 var3
1111 1010 2 2 34
1112 1010 3 7 18
1113 1010 20 12 11
1114 1010 19 11 22
[...] [...] [...] [...] [...]
1111 1020 14 19 12
1112 1020 10 10 13
The goal would be to obtain one row per area with the variables in wide format.
Like:
Area 1111Var1 1111Var2 111Var3 1112Var1 1112Var2 1112Var3
1010 2 2 34 3 7 18
So far I have tried spread and mutate in tidyr but without much success.
You'll need three tidyr steps here:
d %>%
gather(key, value, -Codes, -areas) %>%
unite(combined, Codes, key, sep = "") %>%
spread(combined, value)
Where d
is your data.
To explain the steps:
library(tidyr)
# setting up data
d <- readr::read_delim("Codes areas var1 var2 var3
1111 1010 2 2 34
1112 1010 3 7 18
1113 1010 20 12 11
1114 1010 19 11 22
1111 1020 14 19 12
1112 1020 10 10 13", delim = " ")
First you need to gather the var1, var2, var3 columns:
d %>%
gather(key, value, -Codes, -areas)
#> Source: local data frame [18 x 4]
#>
#> Codes areas key value
#> (int) (int) (fctr) (int)
#> 1 1111 1010 var1 2
#> 2 1112 1010 var1 3
#> 3 1113 1010 var1 20
#> 4 1114 1010 var1 19
#> 5 1111 1020 var1 14
#> 6 1112 1020 var1 10
#> 7 1111 1010 var2 2
#> 8 1112 1010 var2 7
#> 9 1113 1010 var2 12
#> 10 1114 1010 var2 11
#> 11 1111 1020 var2 19
#> 12 1112 1020 var2 10
#> 13 1111 1010 var3 34
#> 14 1112 1010 var3 18
#> 15 1113 1010 var3 11
#> 16 1114 1010 var3 22
#> 17 1111 1020 var3 12
#> 18 1112 1020 var3 13
Then combine them with the Codes
column using tidyr's unite
:
d %>%
gather(key, value, -Codes, -areas) %>%
unite(combined, Codes, key, sep = "")
#> Source: local data frame [18 x 3]
#>
#> combined areas value
#> (chr) (int) (int)
#> 1 1111var1 1010 2
#> 2 1112var1 1010 3
#> 3 1113var1 1010 20
#> 4 1114var1 1010 19
#> 5 1111var1 1020 14
#> 6 1112var1 1020 10
#> 7 1111var2 1010 2
#> 8 1112var2 1010 7
#> 9 1113var2 1010 12
#> 10 1114var2 1010 11
#> 11 1111var2 1020 19
#> 12 1112var2 1020 10
#> 13 1111var3 1010 34
#> 14 1112var3 1010 18
#> 15 1113var3 1010 11
#> 16 1114var3 1010 22
#> 17 1111var3 1020 12
#> 18 1112var3 1020 13
Now spread
will work:
d %>%
gather(key, value, -Codes, -areas) %>%
unite(combined, Codes, key, sep = "") %>%
spread(combined, value)
#> Source: local data frame [2 x 13]
#>
#> areas 1111var1 1111var2 1111var3 1112var1 1112var2 1112var3 1113var1
#> (int) (int) (int) (int) (int) (int) (int) (int)
#> 1 1010 2 2 34 3 7 18 20
#> 2 1020 14 19 12 10 10 13 NA
#> Variables not shown: 1113var2 (int), 1113var3 (int), 1114var1 (int),
#> 1114var2 (int), 1114var3 (int)
I was able to do this in the following way but it may not be the best/most efficient
df <- read.table(header = TRUE, stringsAsFactors = FALSE, text = '
Codes areas var1 var2 var3
1111 1010 2 2 34
1112 1010 3 7 18
1113 1010 20 12 11
1114 1010 19 11 22
1111 1020 14 19 12
1112 1020 10 10 13')
df_new <-
df %>%
gather(var_type, var_value, -areas, -Codes) %>%
mutate(var_code = paste(Codes, var_type, sep = '_')) %>%
select(-Codes, -var_type) %>%
spread(var_code, var_value)
df_new
# areas 1111_var1 1111_var2 1111_var3 1112_var1 1112_var2 1112_var3 1113_var1 1113_var2 1113_var3 1114_var1 1114_var2 1114_var3
#1 1010 2 2 34 3 7 18 20 12 11 19 11 22
#2 1020 14 19 12 10 10 13 NA NA NA NA NA NA
I hope this helps.
EDIT
Here is the version of the above solution using unite
instead, as used in the @David Robinson Answer.
df %>%
gather(var_type, var_value, -areas, -Codes) %>%
unite(NewCode, Codes, var_type, sep = '') %>%
spread(NewCode, var_value)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.