[英]dplyr mutate multiple columns based on names in vectors
我想通過使用dplyr's mutate
函數將兩列彼此相乘。
但是,我不想為每個變異條件寫新行,而是使用存儲在向量var1
和var2
中的列的名稱。 例如,最后,我想在我現有的bankdata
有一個名為result1
的附加列,該列通過將現金和貸款列彼此相乘來包含結果。 這將繼續直到創建了3個新列。
可復制的代碼:
bankname <- c("Bank A", "Bank B", "Bank C", "Bank D", "Bank E")
bankid <- c(1, 2, 3, 4, 5)
year <- c(1881, 1881, 1881, 1881, 1881)
totass <- c(244789, 195755, 107736, 170600, 32000000)
cash <- c(7250, 10243, 13357, 35000, 351266)
bond <- c(20218, 185151, 177612, 20000, 314012)
loans <- c(29513, 2800, NA, 5000, NA)
bankdata <- data.frame(bankname, bankid, year, totass, cash, bond, loans)
向量var1和var2包含我要相乘的列名( cash*loans, bond*cash, loans*bankid
), 輸出是新列的名稱:
var1 <- c("cash", "bond", "loans")
var2 <- c("loans","cash", "bankid")
output <- c("result1", "result2", "result3")
我想做類似的事情:
bankdata %>%
mutate_at(.funs = funs(output = var1*var2), vars(var1, var2))
bankdata %>%
mutate_at(.funs = funs(result1 = cash*., result2 = bond*., result3 = loans*.), vars(var2))
使用tidyeval
方法 ,我們構建了一個函數,該函數可以將字符串作為輸入,然后創建新列。 注意使用rlang::sym
和!!
(嘭嘭)。
之后,我們可以使用purrr::pmap_dfc
遍歷var1
, var2
來創建新列,其名稱由output
提供
library(tidyverse)
bankname <- c("Bank A", "Bank B", "Bank C", "Bank D", "Bank E")
bankid <- c(1, 2, 3, 4, 5)
year <- c(1881, 1881, 1881, 1881, 1881)
totass <- c(244789, 195755, 107736, 170600, 32000000)
cash <- c(7250, 10243, 13357, 35000, 351266)
bond <- c(20218, 185151, 177612, 20000, 314012)
loans <- c(29513, 2800, NA, 5000, NA)
bankdata <- data.frame(bankname, bankid, year, totass, cash, bond, loans)
originalNames <- names(bankdata)
var1 <- c("cash", "bond", "loans")
var2 <- c("loans","cash", "bankid")
output <- c("result1", "result2", "result3")
my_mutate <- function(df, var1, var2, output) {
var1 <- rlang::sym(var1)
var2 <- rlang::sym(var2)
output <- rlang::sym(output)
df <- df %>%
mutate(!! output := !! var1 * !! var2)
return(df)
}
# test
my_mutate(bankdata, var1[1], var2[1], output[1])
#> bankname bankid year totass cash bond loans result1
#> 1 Bank A 1 1881 244789 7250 20218 29513 213969250
#> 2 Bank B 2 1881 195755 10243 185151 2800 28680400
#> 3 Bank C 3 1881 107736 13357 177612 NA NA
#> 4 Bank D 4 1881 170600 35000 20000 5000 175000000
#> 5 Bank E 5 1881 32000000 351266 314012 NA NA
# loop through 3 lists simultaneously
# keep only original and result* columns
pmap_dfc(list(var1, var2, output), ~ my_mutate(bankdata, ..1, ..2, ..3)) %>%
select(!! originalNames, starts_with("result"))
#> bankname bankid year totass cash bond loans result1 result2
#> 1 Bank A 1 1881 244789 7250 20218 29513 213969250 146580500
#> 2 Bank B 2 1881 195755 10243 185151 2800 28680400 1896501693
#> 3 Bank C 3 1881 107736 13357 177612 NA NA 2372363484
#> 4 Bank D 4 1881 170600 35000 20000 5000 175000000 700000000
#> 5 Bank E 5 1881 32000000 351266 314012 NA NA 110301739192
#> result3
#> 1 29513
#> 2 5600
#> 3 NA
#> 4 20000
#> 5 NA
由reprex軟件包 (v0.2.0)於2018-04-18創建。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.