![](/img/trans.png)
[英]R: How to use a global variable that clashes with column name a dplyr workflow?
[英]How to use R and dplyr to paste column name as value for lookup
我正在使用來自 glm 回歸 model 的 output 系數,我需要創建一個查找值,使用鍵粘貼([列名].[因子級別],然后從另一個數據表返回相應的值。列名必須是動態的,這樣我就不必一一明確地命名每一列。然后將查找的返回值乘以 1(對於因子)或實際數值,並將所有 coef_colnames 相加到列 Total 中。
我在 excel 中做了一些示例,但無法在 R 中復制它。 var_Factor1 結合每行的列名和因子級別(使用粘貼)來構建下一步查找的鍵
var_Number1 只是列名,因為它是數字並且沒有因子級別
library(dplyr)
# original data
dt = data.table(
Factor1 = c("A","B","C"),
Number1 = c(10, 20,40),
Factor2 = c("D","H","N"),
Number2 = c(2, 5,3)
)
# Lookup table
model_coef = data.table(
Factor1.A = 10,
Factor1.B = 20,
Factor1.C = 30,
Factor2.D = 40,
Factor2.H = 50,
Factor2.N = 60,
Number1 = 200,
Number2 = 500
)
#initial steps
dt <- dt %>% mutate (
var_Factor1 = paste("Factor1", Factor1, sep =".")
, var_Number1 = "Number1"
, var_Factor2 = paste("Factor2", Factor2, sep =".")
, var_Number2 = "Number2"
) %>% mutate (
coef_Factor1 = model_coef[,var_Factor1]
)
#The final output should produce (as replicated from Excel)
final_output = data.table (
Factor1= c("A", "B", "C"),
Number1= c(10, 20, 40),
Factor2= c("D", "H", "N"),
Number2= c(2, 5, 3),
var_Factor1= c("Factor1.A", "Factor1.B", "Factor1.C"),
var_Number1= c("Number1", "Number1", "Number1"),
var_Factor2= c("Factor2.D", "Factor2.H", "Factor2.N"),
var_Number2= c("Number2", "Number2", "Number2"),
coef_Factor1= c(10, 20, 30),
coef_Number1= c(200, 200, 200),
coef_Factor2= c(40, 50, 60),
coef_Number2= c(500, 500, 500),
calc_Factor1= c(10, 20, 30),
calc_Number1= c(2000, 4000, 8000),
calc_Factor2= c(40, 50, 60),
calc_Number2= c(1000, 2500, 1500),
Total= c(3050, 6570, 9590)
)
嘗試生成和操作動態列通常是個壞主意。 使用整潔的數據約定並使數據“長”可能會更好。 此外,您似乎正在嘗試混合 data.table 和 dplyr/tidyverse。 特別是,這不起作用: mutate (coef_Factor1 = model_coef[,var_Factor1]
我整理了您的數據並修改了您的代碼以使用下面的 dplyr/tidyverse:
除了您的示例之外,如果您有超過 2 個“數字”/“因子”(您的命名/標簽/編號令人困惑),還有一些方法可以進一步概括,以便代碼一般乘以 coef * 數字,對於每個“數字” /組合。 此外,您的數據暗示但尚不清楚 A 與 D 相關,B 與 H 相關,等等。
library(tidyverse)
data <- tibble(Factor1 = c("A","B","C"),Number1 = c(10, 20,40),Factor2 = c("D","H","N"),Number2 = c(2, 5,3))
model_coef <- tibble(Factor1.A = 10,Factor1.B = 20,Factor1.C = 30,Factor2.D = 40,Factor2.H = 50,Factor2.N = 60,Number1 = 200,Number2 = 500)
(model_coef_factor1 <- model_coef %>%
select(Factor1.A:Factor1.C) %>%
pivot_longer(cols = everything(), names_to = c("number", "factor"), names_sep = "[.]", values_to = "coef_factor1") %>%
select(-number))
#> # A tibble: 3 x 2
#> factor coef_factor1
#> <chr> <dbl>
#> 1 A 10
#> 2 B 20
#> 3 C 30
(model_coef_factor2 <- model_coef %>%
select(Factor2.D:Factor2.N) %>%
pivot_longer(cols = everything(), names_to = c("number", "factor"), names_sep = "[.]", values_to = "coef_factor2") %>%
select(-number))
#> # A tibble: 3 x 2
#> factor coef_factor2
#> <chr> <dbl>
#> 1 D 40
#> 2 H 50
#> 3 N 60
(final_output <- data %>%
left_join(model_coef_factor1, by = c("Factor1"="factor")) %>%
left_join(model_coef_factor2, by = c("Factor2"="factor")) %>%
mutate(coef_number1 = model_coef$Number1,
coef_number2 = model_coef$Number2,
calc_factor1 = coef_factor1,
calc_number1 = Number1 * coef_number1,
calc_factor2 = coef_factor2,
calc_number2 = Number2 * coef_number2,
total = calc_factor1 + calc_number1 + calc_factor2 + calc_number2) %>%
select(total, everything()))
#> # A tibble: 3 x 13
#> total Factor1 Number1 Factor2 Number2 coef_factor1 coef_factor2
#> <dbl> <chr> <dbl> <chr> <dbl> <dbl> <dbl>
#> 1 3050 A 10 D 2 10 40
#> 2 6570 B 20 H 5 20 50
#> 3 9590 C 40 N 3 30 60
#> # ... with 6 more variables: coef_number1 <dbl>, coef_number2 <dbl>,
#> # calc_factor1 <dbl>, calc_number1 <dbl>, calc_factor2 <dbl>,
#> # calc_number2 <dbl>
由代表 package (v0.3.0) 於 2019 年 10 月 23 日創建
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.