[英]Return multiple columns in dplyr mutate
當函數返回多列時,如何使用mutate
將函數應用於列? 下面我試圖從字符列制作虛擬/單熱列(我知道可能有 100 多種制作虛擬列的方法,但這是為了說明返回多列的意義)。 它返回美元符號(例如, Treatment$Isnonchilled
而不是如下圖所示的nonchilled
。這意味着該列不是原子向量而是數據框。
library(textshape)
library(dplyr)
one_hot <- function(x, drop.jth = TRUE, keep.na = TRUE, prefix = "Is", ...) {
y <- tibble::as_tibble(textshape::mtabulate(x))
if (keep.na) y[is.na(x), ] <- NA
if(drop.jth) y <- y[1:(ncol(y) - 1)]
colnames(y) <- paste0(prefix, colnames(y))
y
}
CO2 %>%
as_tibble() %>%
mutate(
conc2 = conc^2,
across(c(Treatment), one_hot)
)
CO2 %>%
as_tibble() %>%
mutate(
conc2 = conc^2,
across(c(Treatment), one_hot)
) %>%
lapply(class)
$Plant
[1] "ordered" "factor"
$Type
[1] "factor"
$Treatment
[1] "tbl_df" "tbl" "data.frame"
$conc
[1] "numeric"
$uptake
[1] "numeric"
$conc2
[1] "numeric"
好吧,您不必修改您的功能。 就這樣做
CO2 %>%
as_tibble() %>%
mutate(
conc2 = conc^2,
across(c(Treatment), one_hot)$Treatment # see here
)
輸出
# A tibble: 84 x 7
Plant Type Treatment conc uptake conc2 Isnonchilled
<ord> <fct> <fct> <dbl> <dbl> <dbl> <int>
1 Qn1 Quebec nonchilled 95 16 9025 1
2 Qn1 Quebec nonchilled 175 30.4 30625 1
3 Qn1 Quebec nonchilled 250 34.8 62500 1
4 Qn1 Quebec nonchilled 350 37.2 122500 1
5 Qn1 Quebec nonchilled 500 35.3 250000 1
6 Qn1 Quebec nonchilled 675 39.2 455625 1
7 Qn1 Quebec nonchilled 1000 39.7 1000000 1
8 Qn2 Quebec nonchilled 95 13.6 9025 1
9 Qn2 Quebec nonchilled 175 27.3 30625 1
10 Qn2 Quebec nonchilled 250 37.1 62500 1
# ... with 74 more rows
對於跨多列的變異,
CO2 %>%
as_tibble() %>%
mutate(
conc2 = conc^2,
bind_cols(as.list(across(starts_with("T"), one_hot)))
)
輸出
# A tibble: 84 x 8
Plant Type Treatment conc uptake conc2 IsQuebec Isnonchilled
<ord> <fct> <fct> <dbl> <dbl> <dbl> <int> <int>
1 Qn1 Quebec nonchilled 95 16 9025 1 1
2 Qn1 Quebec nonchilled 175 30.4 30625 1 1
3 Qn1 Quebec nonchilled 250 34.8 62500 1 1
4 Qn1 Quebec nonchilled 350 37.2 122500 1 1
5 Qn1 Quebec nonchilled 500 35.3 250000 1 1
6 Qn1 Quebec nonchilled 675 39.2 455625 1 1
7 Qn1 Quebec nonchilled 1000 39.7 1000000 1 1
8 Qn2 Quebec nonchilled 95 13.6 9025 1 1
9 Qn2 Quebec nonchilled 175 27.3 30625 1 1
10 Qn2 Quebec nonchilled 250 37.1 62500 1 1
# ... with 74 more rows
該函數的輸出是一個 data.frame。 在函數內部,我使用pull
函數來獲取向量。
library(textshape)
library(dplyr)
one_hot <- function(x, drop.jth = TRUE, keep.na = TRUE, prefix = "Is", ...) {
y <- tibble::as_tibble(textshape::mtabulate(x))
if (keep.na) y[is.na(x), ] <- NA
if(drop.jth) y <- y[1:(ncol(y) - 1)]
colnames(y) <- paste0(prefix, colnames(y))
y %>% pull(1) # you need to transform the df to a vector
}
CO2 %>%
as_tibble() %>%
mutate(
conc2 = conc^2,
across(c(Treatment), one_hot)
)
使用您的原始函數和purrr::map
,您可以生成一個列列表,然后將它們綁定回您的原始數據幀。
purrr::map(c('Treatment','Type'), ~one_hot(CO2[[.x]])) %>%
bind_cols(CO2)
# A tibble: 84 x 7
Isnonchilled IsQuebec Plant Type Treatment conc uptake
<int> <int> <ord> <fct> <fct> <dbl> <dbl>
1 1 1 Qn1 Quebec nonchilled 95 16
2 1 1 Qn1 Quebec nonchilled 175 30.4
3 1 1 Qn1 Quebec nonchilled 250 34.8
4 1 1 Qn1 Quebec nonchilled 350 37.2
5 1 1 Qn1 Quebec nonchilled 500 35.3
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.