簡體   English   中英

如何使用 R 和 dplyr 將列名粘貼為查找值

[英]How to use R and dplyr to paste column name as value for lookup

我正在使用來自 glm 回歸 model 的 output 系數,我需要創建一個查找值,使用鍵粘貼([列名].[因子級別],然后從另一個數據表返回相應的值。列名必須是動態的,這樣我就不必一一明確地命名每一列。然后將查找的返回值乘以 1(對於因子)或實際數值,並將所有 coef_colnames 相加到列 Total 中。

我在 excel 中做了一些示例,但無法在 R 中復制它。 var_Factor1 結合每行的列名和因子級別(使用粘貼)來構建下一步查找的鍵

var_Number1 只是列名,因為它是數字並且沒有因子級別

library(dplyr)

# original data
dt = data.table(
  Factor1  = c("A","B","C"),
  Number1 = c(10, 20,40),
  Factor2 = c("D","H","N"),
  Number2 = c(2, 5,3)
)

# Lookup table
model_coef = data.table(
    Factor1.A   = 10,
    Factor1.B   = 20,
    Factor1.C   = 30,
    Factor2.D   = 40,
    Factor2.H   = 50,
    Factor2.N   = 60,
    Number1 = 200,
    Number2 = 500
)

#initial steps
dt <- dt %>% mutate (
  var_Factor1 = paste("Factor1", Factor1, sep =".")
, var_Number1 = "Number1"
, var_Factor2 = paste("Factor2", Factor2, sep =".")
, var_Number2 = "Number2"
) %>% mutate (
    coef_Factor1 = model_coef[,var_Factor1]
)

#The final output should produce (as replicated from Excel)


final_output = data.table (
  Factor1= c("A", "B", "C"),
  Number1= c(10, 20, 40),
  Factor2= c("D", "H", "N"),
  Number2= c(2, 5, 3),
  var_Factor1= c("Factor1.A", "Factor1.B", "Factor1.C"),
  var_Number1= c("Number1", "Number1", "Number1"),
  var_Factor2= c("Factor2.D", "Factor2.H", "Factor2.N"),
  var_Number2= c("Number2", "Number2", "Number2"),
  coef_Factor1= c(10, 20, 30),
  coef_Number1= c(200, 200, 200),
  coef_Factor2= c(40, 50, 60),
  coef_Number2= c(500, 500, 500),
  calc_Factor1= c(10, 20, 30),
  calc_Number1= c(2000, 4000, 8000),
  calc_Factor2= c(40, 50, 60),
  calc_Number2= c(1000, 2500, 1500),
  Total= c(3050, 6570, 9590)
)

嘗試生成和操作動態列通常是個壞主意。 使用整潔的數據約定並使數據“長”可能會更好。 此外,您似乎正在嘗試混合 data.table 和 dplyr/tidyverse。 特別是,這不起作用: mutate (coef_Factor1 = model_coef[,var_Factor1]

我整理了您的數據並修改了您的代碼以使用下面的 dplyr/tidyverse:

  • 使用 tibble 代替 data.table
  • 將查找表重新構建為整齊的格式,以便可以將其正確連接到您的表中
  • 使用 mutate 進行您描述的計算

除了您的示例之外,如果您有超過 2 個“數字”/“因子”(您的命名/標簽/編號令人困惑),還有一些方法可以進一步概括,以便代碼一般乘以 coef * 數字,對於每個“數字” /組合。 此外,您的數據暗示但尚不清楚 A 與 D 相關,B 與 H 相關,等等。

library(tidyverse)

data <- tibble(Factor1  = c("A","B","C"),Number1 = c(10, 20,40),Factor2 = c("D","H","N"),Number2 = c(2, 5,3))
model_coef <- tibble(Factor1.A   = 10,Factor1.B   = 20,Factor1.C   = 30,Factor2.D   = 40,Factor2.H   = 50,Factor2.N   = 60,Number1 = 200,Number2 = 500)

(model_coef_factor1 <- model_coef %>%
    select(Factor1.A:Factor1.C) %>%
    pivot_longer(cols = everything(), names_to = c("number", "factor"), names_sep = "[.]", values_to = "coef_factor1") %>%
    select(-number))
#> # A tibble: 3 x 2
#>   factor coef_factor1
#>   <chr>         <dbl>
#> 1 A                10
#> 2 B                20
#> 3 C                30

(model_coef_factor2 <- model_coef %>%
    select(Factor2.D:Factor2.N) %>%
    pivot_longer(cols = everything(), names_to = c("number", "factor"), names_sep = "[.]", values_to = "coef_factor2") %>%
    select(-number))
#> # A tibble: 3 x 2
#>   factor coef_factor2
#>   <chr>         <dbl>
#> 1 D                40
#> 2 H                50
#> 3 N                60

(final_output <- data %>%
    left_join(model_coef_factor1, by = c("Factor1"="factor")) %>%
    left_join(model_coef_factor2, by = c("Factor2"="factor")) %>%
    mutate(coef_number1 = model_coef$Number1,
           coef_number2 = model_coef$Number2,
           calc_factor1 = coef_factor1,
           calc_number1 = Number1 * coef_number1,
           calc_factor2 = coef_factor2,
           calc_number2 = Number2 * coef_number2,
           total = calc_factor1 + calc_number1 + calc_factor2 + calc_number2) %>%
    select(total, everything()))
#> # A tibble: 3 x 13
#>   total Factor1 Number1 Factor2 Number2 coef_factor1 coef_factor2
#>   <dbl> <chr>     <dbl> <chr>     <dbl>        <dbl>        <dbl>
#> 1  3050 A            10 D             2           10           40
#> 2  6570 B            20 H             5           20           50
#> 3  9590 C            40 N             3           30           60
#> # ... with 6 more variables: coef_number1 <dbl>, coef_number2 <dbl>,
#> #   calc_factor1 <dbl>, calc_number1 <dbl>, calc_factor2 <dbl>,
#> #   calc_number2 <dbl>

代表 package (v0.3.0) 於 2019 年 10 月 23 日創建

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM