簡體   English   中英

使用變量名中的值重塑數據

[英]Reshaping Data with values in variable names

我有一個非常寬的數據集(2000多個變量),我試圖使其整潔,但試圖從變量名中提取一個值卻陷入困境。 如果我有一個變量"E1Time1_Date"我想將其重塑為三個變量: E=1Time=1Date =原始日期值。

這有可能嗎? 我嘗試使用gather()但我想我首先需要做的一個步驟是我丟失了。 謝謝您的幫助!

在此處輸入圖片說明

如果有人想實現魔術,這是示例數據集:

structure(list(ID = c(123, 225), UnrelatedV1 = c("Unrelated1", 
"Unrelated1"), UnrelatedV2 = c("Unrelated2", "Unrelated2"), E1T1_Date = structure(c(1506816000, 
1513296000), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
    E1T1_v1 = c(10, 20), E1T1_v2 = c(20, 20), E1T1_v3 = c(30, 
    20), E1T1_v4 = c(40, 20), E1T2_Date = structure(c(1512086400, 
    NA), class = c("POSIXct", "POSIXt"), tzone = "UTC"), E1T2_v1 = c(10, 
    NA), E1T2_v2 = c(10, NA), E1T2_v3 = c(10, NA), E1T2_v4 = c(10, 
    NA), E2T1_Date = structure(c(1522540800, 1525132800), class = c("POSIXct", 
    "POSIXt"), tzone = "UTC"), E2T1_v1 = c(10, 20), E2T1_v2 = c(20, 
    20), E2T1_v3 = c(10, 20), E2T1_v4 = c(10, 20), E2T2_Date = structure(c(1533859200, 
    NA), class = c("POSIXct", "POSIXt"), tzone = "UTC"), E2T2_v1 = c(10, 
    NA), E2T2_v2 = c(30, NA), E2T2_v3 = c(10, NA), E2T2_v4 = c(10, 
    NA)), .Names = c("ID", "UnrelatedV1", "UnrelatedV2", "E1T1_Date", 
"E1T1_v1", "E1T1_v2", "E1T1_v3", "E1T1_v4", "E1T2_Date", "E1T2_v1", 
"E1T2_v2", "E1T2_v3", "E1T2_v4", "E2T1_Date", "E2T1_v1", "E2T1_v2", 
"E2T1_v3", "E2T1_v4", "E2T2_Date", "E2T2_v1", "E2T2_v2", "E2T2_v3", 
"E2T2_v4"), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-2L))

看起來您混合了數字和日期值,這會使收集起來有些棘手。 一種方法是暫時將日期轉換為數字,然后在使用最終格式后可以將其更改回。 這應該使您入門。

library(tidyverse)
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    -2L))
data %>%  
  #convert dates to numeric so we can gather them in the same column
  mutate_if(is.POSIXct, as.integer) %>%
  gather(-ID, -contains("Unrelated"), key = variable, value = value) %>% 

  #add an underscore between E and T to make separating them easier
  mutate(loc = gregexpr("T", variable)[[1]],
         variable = paste0(substr(variable, 1, loc - 1), "_",
                           substr(variable, loc, nchar(variable)))) %>% 
  select(-loc) %>% 

  #separate into three distinct columns
  separate(variable, into = c("E", "T", "vDate"), sep = "_")

# A tibble: 40 x 7
ID      UnrelatedV1 UnrelatedV2     E     T vDate      value
<dbl>       <chr>       <chr>   <chr> <chr> <chr>      <dbl>
1   123  Unrelated1  Unrelated2    E1    T1  Date 1506816000
2   225  Unrelated1  Unrelated2    E1    T1  Date 1513296000
3   123  Unrelated1  Unrelated2    E1    T1    v1         10
4   225  Unrelated1  Unrelated2    E1    T1    v1         20
5   123  Unrelated1  Unrelated2    E1    T1    v2         20
6   225  Unrelated1  Unrelated2    E1    T1    v2         20
7   123  Unrelated1  Unrelated2    E1    T1    v3         30
8   225  Unrelated1  Unrelated2    E1    T1    v3         20
9   123  Unrelated1  Unrelated2    E1    T1    v4         40
10   225  Unrelated1  Unrelated2    E1    T1    v4         20

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM