简体   繁体   中英

How to melt data from wide format to long format using similar column names in R?

Recently, I scraped data from a website and it resembles data table in the input variable below.

input <- data.frame(
     "Date" = sprintf("%02d-Jan", 1:15),
     "Type_event_1" =  c(rep("Skiing", 3), rep("Marathon", 7), rep("Skating", 5)),
     "sport_event_1"= c(rep("Alpine skiing",4), rep("Biathlon",6), rep("Curling",3), rep("Figure skating",2)),
     "Type_event_2" =  c(rep("Skiing", 4), rep("Marathon", 6),rep("Ice-Hockey", 3), rep("Skating", 2)),
     "sport_event_2"= c(rep("Skeleton",4), rep("Luge",6), rep("Hockey",3), rep("Ski Jumping",2))
     )

I want to rbind columns with common suffix ("event_1", "event_2") one below other along with the 'Date' column. In this case I just have 4 columns ie 2 events what if I had 40 columns ie_ 20 such events. How can I do this using a for loop ?
The expected output look like this:

expected_output <- data.frame(
  "Date" = rep(sprintf("%02d-Jan", 1:15),2),
  "Type_event_1" =  c(rep("Skiing", 3), rep("Marathon", 7), rep("Skating", 5),rep("Skiing", 4), rep("Marathon", 6),rep("Ice-Hockey", 3), rep("Skating", 2)),
  "sport_event_1"= c(rep("Alpine skiing",4), rep("Biathlon",6), rep("Curling",3), rep("Figure skating",2),rep("Skeleton",4), rep("Luge",6), rep("Hockey",3), rep("Ski Jumping",2))
)

Try

library(data.table)
library(dplyr)
out1=data.table::melt(input[c(1,grep("Type_event_",names(input)))],"Date")[,c(1,3)]
out2=data.table::melt(input[c(1,grep("sport_event_",names(input)))],"Date")[,c(1,3)]
final<-cbind(out1,out2[,-1])
names(final)<-c("Date","Type_event","sport_event")
library(tidyverse)

tbl_df(input) %>%
  unite(v1, Type_event_1, sport_event_1) %>%
  unite(v2, Type_event_2, sport_event_2) %>%
  gather(v1,v2, -Date) %>%
  separate(v2, c("Type_event","sport_event"), sep = "_") %>%
  select(-v1)

# # A tibble: 30 x 3
#     Date   Type_event sport_event  
#    <fct>  <chr>      <chr>        
# 1 01-Jan Skiing     Alpine skiing
# 2 02-Jan Skiing     Alpine skiing
# 3 03-Jan Skiing     Alpine skiing
# 4 04-Jan Marathon   Alpine skiing
# 5 05-Jan Marathon   Biathlon     
# 6 06-Jan Marathon   Biathlon     
# 7 07-Jan Marathon   Biathlon     
# 8 08-Jan Marathon   Biathlon     
# 9 09-Jan Marathon   Biathlon     
#10 10-Jan Marathon   Biathlon     
# # ... with 20 more rows

Note: I'm using tbl_df(input) only for visualisation purposes. You can use just input %>% ... .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM