簡體   English   中英

R中的列操作-數據存儲為列名

[英]Column manipulation in R - data stored as column name

我有一個格式異常的數據數據框,其中信息存儲為列名稱的一部分。

library(tidyverse)

Ihave <- frame_data(
  ~ID,~group,~AAA_info2_BBB,~CCC_info3_DDD,
  "first",  1, as.Date("1970-01-01"), as.Date("1970-01-02"),
  "second", 2, as.Date("1971-01-01"), as.Date("1971-01-02"),
  "third",  3, as.Date("1972-01-01"), as.Date("1972-01-02"),
)

# A tibble: 3 x 4
  ID     group AAA_info2_BBB CCC_info3_DDD
  <chr>  <dbl> <date>        <date>       
1 first      1 1970-01-01    1970-01-02   
2 second     2 1971-01-01    1971-01-02   
3 third      3 1972-01-01    1972-01-02   

我將需要在數據框中獲取信息,如下所示

Iwant <-  frame_data(
  ~ID,~group,~source,~variable,~value,~period,
  "first",  1, "AAA", "info1", as.Date("1970-01-01"), "BBB",
  "second", 2, "AAA", "info1", as.Date("1971-01-01"), "BBB",
  "third",  3, "AAA", "info1", as.Date("1972-01-01"), "BBB",
  "first",  1, "CCC", "info2", as.Date("1970-01-02"), "DDD",
  "second", 2, "CCC", "info2", as.Date("1971-01-02"), "DDD",
  "third",  3, "CCC", "info2", as.Date("1972-01-02"), "DDD",
)

# A tibble: 6 x 6
  ID     group source variable value      period
  <chr>  <dbl> <chr>  <chr>    <date>     <chr> 
1 first      1 AAA    info1    1970-01-01 BBB   
2 second     2 AAA    info1    1971-01-01 BBB   
3 third      3 AAA    info1    1972-01-01 BBB   
4 first      1 CCC    info2    1970-01-02 DDD   
5 second     2 CCC    info2    1971-01-02 DDD   
6 third      3 CCC    info2    1972-01-02 DDD   

我雖然可以通過編寫一次處理“ AAA_info2_BBB”類型的列之一的函數來工作,但似乎可以使用以下函數來工作

my_fun <- function(df, one_var) {

  # Get string from called column name
  one_var_char <- 
    enquo(one_var) %>%  
    { as.character(.)[2] }  

  # Split string across "_" and return character vector
  one_var_char_splitted <- 
    one_var_char %>% 
    { strsplit(., "_")[[1]] }

  new_one_var <- one_var_char_splitted[2]

  names(df)[names(df) == one_var_char] <- new_one_var

  df %>%
    select(new_one_var) %>% 
    data.frame(source = one_var_char_splitted[1],
               period = one_var_char_splitted[3] )
}

哪個回報(如預期)

Ihave %>% 
  select(ID, group, AAA_info2_BBB) %>% 
  my_fun(AAA_info2_BBB)

       info2 source period
1 1970-01-01    AAA    BBB
2 1971-01-01    AAA    BBB
3 1972-01-01    AAA    BBB

但我不能管理該功能“映射”到Ihave數據幀以產生期望的Iwant 我嘗試了purrr::map幾種混合, purrr::map沒有成功。 我的方法有缺陷嗎? 我錯過了什么嗎?

任何幫助,不勝感激!

我在看到@aosmith的評論之前就已經做好了這一准備,這是當下的:

library(dplyr)
library(tidyr)
Ihave %>%
  gather(source, value, -ID, -group) %>%
  separate(source, into = c("source", "variable", "period"), sep = "_")
# # A tibble: 6 x 6
#   ID     group source variable period value     
#   <chr>  <dbl> <chr>  <chr>    <chr>  <date>    
# 1 first      1 AAA    info2    BBB    1970-01-01
# 2 second     2 AAA    info2    BBB    1971-01-01
# 3 third      3 AAA    info2    BBB    1972-01-01
# 4 first      1 CCC    info3    DDD    1970-01-02
# 5 second     2 CCC    info3    DDD    1971-01-02
# 6 third      3 CCC    info3    DDD    1972-01-02

它依賴於常量,有序和已知的_分隔字段的數量。 如果格式從不改變,那就很好。 否則,您將需要寫一些更具體/更常規的東西來應對任何變化。

如果您已經在加載library(tidyverse)則無需顯式調用library(dplyr)tidyr (我將它們包括在這里,以防萬一(a)有人出現並且沒有明確加載所有25個軟件包,或者(b)您以為您需要所有這些軟件包,但想通過修剪未使用的軟件包來減少加載時間。)

gather然后separate想法相同,但僅出於多樣性考慮,這里是一個使用melttstrsplitdata.table方法

library(data.table)
setDT(Ihave)

melt(Ihave, c('ID', 'group'))[, 
  c('source', 'variable', 'period') := tstrsplit(variable, '_')]

#        ID group variable      value source period
# 1:  first     1    info2 1970-01-01    AAA    BBB
# 2: second     2    info2 1971-01-01    AAA    BBB
# 3:  third     3    info2 1972-01-01    AAA    BBB
# 4:  first     1    info3 1970-01-02    CCC    DDD
# 5: second     2    info3 1971-01-02    CCC    DDD
# 6:  third     3    info3 1972-01-02    CCC    DDD

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM