简体   繁体   English

有没有办法从 R 中的数据字典创建因子?

[英]Is there a way to create factors from data dictionary in R?

Am trying to create factors from the data dictionary ?是否试图从数据字典中创建因素? I tried using the Map but all the variables are converted to missing.我尝试使用Map但所有变量都转换为缺失。 How best was to approach this approach this?如何最好地接近这种方法? Doing it the purrr way would also be welcome.也欢迎以purrr方式进行。

library(dplyr)

mydata <- tibble(
  a_1 = c(20,22, 13,14,44),
  a_2 = c(42, 13, 32, 31, 14),
  b = c(1, 2, 1, 1, 2),
  c = c(1, 2, 1, 3, 1)
)



dictionary <- tibble(
  variable = c("a", "b", "c"),
  label = c("Age", "Gender", "Education"),
  type = c("mselect", "select", "select"),
  values = c(NA, "1, 2", "1, 2,3" ),
  valuelabel = c(NA, "Male, Female", "Primary, Secondary, Tertiary")

)

# Expected results 
expectedata <- mydata %>% 
  mutate(
    b = factor(b, levels = c(1, 2), labels = c("Male", "Female")),
    c = factor(c, levels = c(1, 2, 3), 
               labels = c("Primary", "Secondary", "Tertiary"))
  )
expectedata 


# Select the factor variables

factor_vars <- dictionary %>%
  filter(type == "select") %>% pull(variable)


mydata[] <- Map(
  function(x, fctvalues, fctlabels)  factor(x, fctvalues,  fctlabels) ,
                mydata,
                dictionary$values[ match(factor_vars,
                                                 dictionary$variable) ],

                dictionary$valuelabel[ match(factor_vars,
                                             dictionary$variable) ]
)

Via pivot_ , left_join , and a bit of data wrangling:通过pivot_left_join和一些数据left_join

Data数据

library(tidyverse)

mydata <- tibble(
    a_1 = c(20,22, 13,14,44),
    a_2 = c(42, 13, 32, 31, 14),
    b = c(1, 2, 1, 1, 2),
    c = c(1, 2, 1, 3, 1)
)



dictionary <- tibble(
    variable = c("a", "b", "c"),
    label = c("Age", "Gender", "Education"),
    type = c("mselect", "select", "select"),
    values = c(NA, "1, 2", "1, 2, 3" ),
    valuelabel = c(NA, "Male, Female", "Primary, Secondary, Tertiary")
    
)

Code代码

target_dictionary <- dictionary %>%
    # optional: filter(type == "select") %>%
    separate_rows(values, valuelabel) %>% 
    select(variable, values, valuelabel)

target_mydata <- mydata %>%
    # Assuming you have no unique identifier
    rownames_to_column("id") %>%
    pivot_longer(
        cols = c("b", "c"),
        names_to = "var_name",
        values_to = "var_value"
    ) %>%
    # because the data types don't match here
    mutate(
        var_value = as.character(var_value)
    ) %>%
    left_join(
        target_dictionary,
        by = c("var_name" = "variable", "var_value" = "values")
    ) %>%
    pivot_wider(
        names_from = var_name,
        values_from = valuelabel, 
        id_cols = c("id", "a_1", "a_2")
    ) %>%
    select(-id)

Result:结果:

> target_mydata
# A tibble: 5 × 4
    a_1   a_2 b      c        
  <dbl> <dbl> <chr>  <chr>    
1    20    42 Male   Primary  
2    22    13 Female Secondary
3    13    32 Male   Primary  
4    14    31 Male   Tertiary 
5    44    14 Female Primary  


Edit: You cpuld also go one step further and rename the factor column names.编辑:您还可以更进一步,重命名因子列名称。

Renaming the columns重命名列

target_mydata %>%
    rename_with(
        .fn = ~ setNames(dictionary$label, dictionary$variable)[.x], 
        .cols = intersect(names(mydata), setNames(dictionary$variable, dictionary$label))
    )

Result:结果:

# A tibble: 5 × 4
    a_1   a_2 Gender Education
  <dbl> <dbl> <chr>  <chr>    
1    20    42 Male   Primary  
2    22    13 Female Secondary
3    13    32 Male   Primary  
4    14    31 Male   Tertiary 
5    44    14 Female Primary  

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从R中的数据框创建因子矩阵 - Create Matrix of factors from data frame in R 程序R根据指定的多个因子级别创建数据直方图 - Program R create histogram of data from specified levels of multiple factors 是否有更好的方法在R中创建分位数“假人”/因子? - Is there a better way to create quantile “dummies” / factors in R? 从R中的先前数据帧复制因子 - Copying factors from previous data frames in R 按多种因素对表进行分组并将其从长格式扩展到宽格式 - R 中的 data.table 方式 - grouping table by multiple factors and spreading it from long format to wide - the data.table way in R 在 R 中,有没有一种方法可以根据数据框中的因子向量为 phytools 树着色? - In R is there a way to color a phytools tree based on a vector of factors from a data frame? 从R中的因子列表创建逻辑或二进制矩阵/数据框架 - Create a logical or binary matrix/data.frame from a list of factors in R 使用两个分组因子在 R 中创建汇总表的方法 - way to create summary table in R with two grouping factors 如何用R中的因子分类的数据帧值创建向量? - How to create a vector with the values of a data frame classified by factors in R? R:如何为每个因子组合创建一个带有观察值的数据框 - R: How to create a data frame with one observation for each combination of factors
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM