[英]Reshaping Data with values in variable names
我有一个非常宽的数据集(2000多个变量),我试图使其整洁,但试图从变量名中提取一个值却陷入困境。 如果我有一个变量"E1Time1_Date"
我想将其重塑为三个变量: E=1
, Time=1
和Date
=原始日期值。
这有可能吗? 我尝试使用gather()
但我想我首先需要做的一个步骤是我丢失了。 谢谢您的帮助!
如果有人想实现魔术,这是示例数据集:
structure(list(ID = c(123, 225), UnrelatedV1 = c("Unrelated1",
"Unrelated1"), UnrelatedV2 = c("Unrelated2", "Unrelated2"), E1T1_Date = structure(c(1506816000,
1513296000), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
E1T1_v1 = c(10, 20), E1T1_v2 = c(20, 20), E1T1_v3 = c(30,
20), E1T1_v4 = c(40, 20), E1T2_Date = structure(c(1512086400,
NA), class = c("POSIXct", "POSIXt"), tzone = "UTC"), E1T2_v1 = c(10,
NA), E1T2_v2 = c(10, NA), E1T2_v3 = c(10, NA), E1T2_v4 = c(10,
NA), E2T1_Date = structure(c(1522540800, 1525132800), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), E2T1_v1 = c(10, 20), E2T1_v2 = c(20,
20), E2T1_v3 = c(10, 20), E2T1_v4 = c(10, 20), E2T2_Date = structure(c(1533859200,
NA), class = c("POSIXct", "POSIXt"), tzone = "UTC"), E2T2_v1 = c(10,
NA), E2T2_v2 = c(30, NA), E2T2_v3 = c(10, NA), E2T2_v4 = c(10,
NA)), .Names = c("ID", "UnrelatedV1", "UnrelatedV2", "E1T1_Date",
"E1T1_v1", "E1T1_v2", "E1T1_v3", "E1T1_v4", "E1T2_Date", "E1T2_v1",
"E1T2_v2", "E1T2_v3", "E1T2_v4", "E2T1_Date", "E2T1_v1", "E2T1_v2",
"E2T1_v3", "E2T1_v4", "E2T2_Date", "E2T2_v1", "E2T2_v2", "E2T2_v3",
"E2T2_v4"), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-2L))
看起来您混合了数字和日期值,这会使收集起来有些棘手。 一种方法是暂时将日期转换为数字,然后在使用最终格式后可以将其更改回。 这应该使您入门。
library(tidyverse)
-2L))
data %>%
#convert dates to numeric so we can gather them in the same column
mutate_if(is.POSIXct, as.integer) %>%
gather(-ID, -contains("Unrelated"), key = variable, value = value) %>%
#add an underscore between E and T to make separating them easier
mutate(loc = gregexpr("T", variable)[[1]],
variable = paste0(substr(variable, 1, loc - 1), "_",
substr(variable, loc, nchar(variable)))) %>%
select(-loc) %>%
#separate into three distinct columns
separate(variable, into = c("E", "T", "vDate"), sep = "_")
# A tibble: 40 x 7
ID UnrelatedV1 UnrelatedV2 E T vDate value
<dbl> <chr> <chr> <chr> <chr> <chr> <dbl>
1 123 Unrelated1 Unrelated2 E1 T1 Date 1506816000
2 225 Unrelated1 Unrelated2 E1 T1 Date 1513296000
3 123 Unrelated1 Unrelated2 E1 T1 v1 10
4 225 Unrelated1 Unrelated2 E1 T1 v1 20
5 123 Unrelated1 Unrelated2 E1 T1 v2 20
6 225 Unrelated1 Unrelated2 E1 T1 v2 20
7 123 Unrelated1 Unrelated2 E1 T1 v3 30
8 225 Unrelated1 Unrelated2 E1 T1 v3 20
9 123 Unrelated1 Unrelated2 E1 T1 v4 40
10 225 Unrelated1 Unrelated2 E1 T1 v4 20
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.