Currently, my data is in the following format:
Var year Co-1 Co-2 Co-3 ...
A 2018 a j .
A 2017 b k .
A 2016 c l .
B 2018 d m .
B 2017 e n .
B 2016 f o .
C 2018 g p .
C 2017 h q .
C 2016 i r .
. . . . .
. . . .
. . . .
I want to transform it to the following format:
Company year A B C
Co-1 2018 a d g
Co-1 2017 b e h
Co-1 2016 c f i
Co-2 2018 j m p
Co-2 2017 k n q
Co-2 2016 l o r
Co-3 2018 . .
Co-3 2017 . .
Co-3 2016 . .
.
.
.
Essentially the changes are:
By doing this, I want to be able to regress year vs. each of A, B, and C separately, while keeping the Company distinction for each data point, so I can group the data points by company in the finished graph.
Thank you so much!
Get the data in long format and then in wide but with different columns.
library(tidyr)
df %>%
pivot_longer(cols = starts_with("Co")) %>%
pivot_wider(names_from = Var, values_from = value)
# A tibble: 6 x 5
# year name A B C
# <int> <chr> <fct> <fct> <fct>
#1 2018 Co-1 a d g
#2 2018 Co-2 j m p
#3 2017 Co-1 b e h
#4 2017 Co-2 k n q
#5 2016 Co-1 c f i
#6 2016 Co-2 l o r
data
df <- structure(list(Var = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L,
3L, 3L), .Label = c("A", "B", "C"), class = "factor"), year = c(2018L,
2017L, 2016L, 2018L, 2017L, 2016L, 2018L, 2017L, 2016L), `Co-1` =
structure(1:9, .Label = c("a", "b", "c", "d", "e", "f", "g", "h", "i"),
class = "factor"), `Co-2` = structure(1:9, .Label = c("j",
"k", "l", "m", "n", "o", "p", "q", "r"), class = "factor")), class = "data.frame",
row.names = c(NA, -9L))
I think it would be much easier to simply use factors here. Length of labels should be the number of unique companies, ie "Var"
values
df$Var <- factor(df$Var, labels=c("Pepsi", "Coke", "Sprite"))
names(df) <- c("company", "year", LETTERS[seq(names(df)[-(1:2)])])
Or in a single step:
df <- setNames(transform(df, Var=factor(Var, labels=c("Pepsi", "Coke", "Sprite"))),
c("company", "year", LETTERS[seq(names(df)[-(1:2)])]))
df
# company year A B C
# 1 Pepsi 2018 a j .
# 2 Pepsi 2017 b k .
# 3 Pepsi 2016 c l .
# 4 Coke 2018 d m .
# 5 Coke 2017 e n .
# 6 Coke 2016 f o .
# 7 Sprite 2018 g p .
# 8 Sprite 2017 h q .
# 9 Sprite 2016 i r .
Also produces cleaner classes:
sapply(df, class)
# company year A B C
# "factor" "integer" "character" "character" "character"
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.