[英]How to melt data from wide format to long format using similar column names in R?
Recently, I scraped data from a website and it resembles data table in the input
variable below. 最近,我从一个网站上抓取了数据,它类似于下面
input
变量中的数据表。
input <- data.frame(
"Date" = sprintf("%02d-Jan", 1:15),
"Type_event_1" = c(rep("Skiing", 3), rep("Marathon", 7), rep("Skating", 5)),
"sport_event_1"= c(rep("Alpine skiing",4), rep("Biathlon",6), rep("Curling",3), rep("Figure skating",2)),
"Type_event_2" = c(rep("Skiing", 4), rep("Marathon", 6),rep("Ice-Hockey", 3), rep("Skating", 2)),
"sport_event_2"= c(rep("Skeleton",4), rep("Luge",6), rep("Hockey",3), rep("Ski Jumping",2))
)
I want to rbind
columns with common suffix ("event_1", "event_2") one below other along with the 'Date' column. 我想
rbind
与普通后缀(“event_1”,“event_2”)一个低于其他列随着“日期”栏。 In this case I just have 4 columns ie 2 events what if I had 40 columns ie_ 20 such events. 在这种情况下,我只有4列,即2个事件,如果我有40列,即20个此类事件,那该怎么办。 How can I do this using a for loop ?
我该如何使用for循环呢?
The expected output look like this: 预期的输出如下所示:
expected_output <- data.frame(
"Date" = rep(sprintf("%02d-Jan", 1:15),2),
"Type_event_1" = c(rep("Skiing", 3), rep("Marathon", 7), rep("Skating", 5),rep("Skiing", 4), rep("Marathon", 6),rep("Ice-Hockey", 3), rep("Skating", 2)),
"sport_event_1"= c(rep("Alpine skiing",4), rep("Biathlon",6), rep("Curling",3), rep("Figure skating",2),rep("Skeleton",4), rep("Luge",6), rep("Hockey",3), rep("Ski Jumping",2))
)
Try 尝试
library(data.table)
library(dplyr)
out1=data.table::melt(input[c(1,grep("Type_event_",names(input)))],"Date")[,c(1,3)]
out2=data.table::melt(input[c(1,grep("sport_event_",names(input)))],"Date")[,c(1,3)]
final<-cbind(out1,out2[,-1])
names(final)<-c("Date","Type_event","sport_event")
library(tidyverse)
tbl_df(input) %>%
unite(v1, Type_event_1, sport_event_1) %>%
unite(v2, Type_event_2, sport_event_2) %>%
gather(v1,v2, -Date) %>%
separate(v2, c("Type_event","sport_event"), sep = "_") %>%
select(-v1)
# # A tibble: 30 x 3
# Date Type_event sport_event
# <fct> <chr> <chr>
# 1 01-Jan Skiing Alpine skiing
# 2 02-Jan Skiing Alpine skiing
# 3 03-Jan Skiing Alpine skiing
# 4 04-Jan Marathon Alpine skiing
# 5 05-Jan Marathon Biathlon
# 6 06-Jan Marathon Biathlon
# 7 07-Jan Marathon Biathlon
# 8 08-Jan Marathon Biathlon
# 9 09-Jan Marathon Biathlon
#10 10-Jan Marathon Biathlon
# # ... with 20 more rows
Note: I'm using tbl_df(input)
only for visualisation purposes. 注意:我仅将
tbl_df(input)
用于可视化目的。 You can use just input %>% ...
. 您可以只使用
input %>% ...
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.