簡體   English   中英

有沒有辦法從 R 中的數據字典創建因子?

[英]Is there a way to create factors from data dictionary in R?

是否試圖從數據字典中創建因素? 我嘗試使用Map但所有變量都轉換為缺失。 如何最好地接近這種方法? 也歡迎以purrr方式進行。

library(dplyr)

mydata <- tibble(
  a_1 = c(20,22, 13,14,44),
  a_2 = c(42, 13, 32, 31, 14),
  b = c(1, 2, 1, 1, 2),
  c = c(1, 2, 1, 3, 1)
)



dictionary <- tibble(
  variable = c("a", "b", "c"),
  label = c("Age", "Gender", "Education"),
  type = c("mselect", "select", "select"),
  values = c(NA, "1, 2", "1, 2,3" ),
  valuelabel = c(NA, "Male, Female", "Primary, Secondary, Tertiary")

)

# Expected results 
expectedata <- mydata %>% 
  mutate(
    b = factor(b, levels = c(1, 2), labels = c("Male", "Female")),
    c = factor(c, levels = c(1, 2, 3), 
               labels = c("Primary", "Secondary", "Tertiary"))
  )
expectedata 


# Select the factor variables

factor_vars <- dictionary %>%
  filter(type == "select") %>% pull(variable)


mydata[] <- Map(
  function(x, fctvalues, fctlabels)  factor(x, fctvalues,  fctlabels) ,
                mydata,
                dictionary$values[ match(factor_vars,
                                                 dictionary$variable) ],

                dictionary$valuelabel[ match(factor_vars,
                                             dictionary$variable) ]
)

通過pivot_left_join和一些數據left_join

數據

library(tidyverse)

mydata <- tibble(
    a_1 = c(20,22, 13,14,44),
    a_2 = c(42, 13, 32, 31, 14),
    b = c(1, 2, 1, 1, 2),
    c = c(1, 2, 1, 3, 1)
)



dictionary <- tibble(
    variable = c("a", "b", "c"),
    label = c("Age", "Gender", "Education"),
    type = c("mselect", "select", "select"),
    values = c(NA, "1, 2", "1, 2, 3" ),
    valuelabel = c(NA, "Male, Female", "Primary, Secondary, Tertiary")
    
)

代碼

target_dictionary <- dictionary %>%
    # optional: filter(type == "select") %>%
    separate_rows(values, valuelabel) %>% 
    select(variable, values, valuelabel)

target_mydata <- mydata %>%
    # Assuming you have no unique identifier
    rownames_to_column("id") %>%
    pivot_longer(
        cols = c("b", "c"),
        names_to = "var_name",
        values_to = "var_value"
    ) %>%
    # because the data types don't match here
    mutate(
        var_value = as.character(var_value)
    ) %>%
    left_join(
        target_dictionary,
        by = c("var_name" = "variable", "var_value" = "values")
    ) %>%
    pivot_wider(
        names_from = var_name,
        values_from = valuelabel, 
        id_cols = c("id", "a_1", "a_2")
    ) %>%
    select(-id)

結果:

> target_mydata
# A tibble: 5 × 4
    a_1   a_2 b      c        
  <dbl> <dbl> <chr>  <chr>    
1    20    42 Male   Primary  
2    22    13 Female Secondary
3    13    32 Male   Primary  
4    14    31 Male   Tertiary 
5    44    14 Female Primary  


編輯:您還可以更進一步,重命名因子列名稱。

重命名列

target_mydata %>%
    rename_with(
        .fn = ~ setNames(dictionary$label, dictionary$variable)[.x], 
        .cols = intersect(names(mydata), setNames(dictionary$variable, dictionary$label))
    )

結果:

# A tibble: 5 × 4
    a_1   a_2 Gender Education
  <dbl> <dbl> <chr>  <chr>    
1    20    42 Male   Primary  
2    22    13 Female Secondary
3    13    32 Male   Primary  
4    14    31 Male   Tertiary 
5    44    14 Female Primary  

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM