I have got below data frame in R and found a way to transpose it but I would like to optimize the code.
So my data frame is as this one:
EVENT NUMBER HOURS_PCT CYCLES_PCT
A23 -17 2 2
A23 -18 3 3
A23 -19 4 4
A23 -20 5 5
A23 -21 6 6
A23 -22 7 7
A23 -23 8 8
A23 -24 9 9
A23 -25 10 10
CD12 -1 11 11
CD12 -2 12 12
CD12 -3 13 13
CD12 -4 14 14
CD12 -5 15 15
CD12 -6 16 16
CD12 -7 17 17
The results are fine and correct and look as below:
EVENT <- c('A23','A23','A23','A23','A23','A23','A23','A23','A23','CD12','CD12','CD12','CD12','CD12','CD12','CD12')
NUMBER <-c('-17','-18','-19','-20','-21','-22','-23','-24','-25','-1','-2','-3','-4','-5','-6','-7')
HOURS_PCT <- seq(from=2, to=17, by=1)
CYCLES_PCT <- seq(from=2, to=17, by=1)
df <- data.frame(EVENT, NUMBER, HOURS_PCT, CYCLES_PCT)
df_1h <- df %>%
arrange(EVENT, NUMBER, HOURS_PCT) %>%
group_by(EVENT) %>% top_n(5,NUMBER) %>%
mutate(SEQ = row_number())
df_1c <- df %>%
arrange(EVENT, NUMBER, CYCLES_PCT) %>%
group_by(EVENT) %>% top_n(5,NUMBER) %>%
mutate(SEQ = row_number())
df_1h$NUMBER<-NULL; df_1h$CYCLES_PCT<-NULL
df_1c$NUMBER<-NULL; df_1c$HOURS_PCT<-NULL
df_1h_t <- spread(df_1h, SEQ, HOURS_PCT, fill = "")
df_1c_t <- spread(df_1c, SEQ, CYCLES_PCT, fill = "")
df_final <- cbind(df_1h_t,df_1c_t)
df_final$EVENT1<-NULL
I find it very manual and wonder if it can be optimized. I tried adding gather and spread into my piping commands but they never worked.
I think what you want can be achieved by first transforming the columns which end with "PCT"
in long format, select top 5 NUMBER
in each EVENT
and column, create a unique identifier row and get the data back in wide format.
library(dplyr)
library(tidyr)
df %>%
pivot_longer(cols = ends_with('PCT')) %>%
group_by(EVENT, name) %>%
top_n(5, NUMBER) %>%
group_by(EVENT) %>%
mutate(SEQ = row_number()) %>%
select(-NUMBER, -name) %>%
pivot_wider(names_from = SEQ, values_from = value)
# EVENT `1` `2` `3` `4` `5` `6` `7` `8` `9` `10`
# <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 A23 6 6 7 7 8 8 9 9 10 10
#2 CD12 13 13 14 14 15 15 16 16 17 17
pivot_longer
and pivot_wider
are successors of gather
and spread
. If you haven't updated tidyr
yet, using gather
and spread
you can do
df %>%
gather(name, value, ends_with('PCT')) %>%
group_by(EVENT, name) %>%
top_n(5, NUMBER) %>%
group_by(EVENT) %>%
mutate(SEQ = row_number()) %>%
select(-NUMBER, -name) %>%
spread(SEQ, value)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.