[英]Syntax for multiple successive operations across columns in `dplyr`
我正在為across
dplyr
中的列的多個連續操作的正確語法而苦苦掙扎。 在這個數據中:
df <- structure(list(A1 = c(838.611, 824.048, 668.901, 225.075, 0,
0, 341.291, 0, 101.652, 127.341, 0, 297.092, 0, 0, 0, 0, 0, 764.737,
759.51, 772.21), A2 = c(499.041, 492.997, 486.132, 469.503, 476.782,
464.18, 469.833, 462.317, 455.507, 441.47, 490.147, 430.844,
0, 0, 0, 0, 0, 0, 0, 124.068)), row.names = c(NA, 20L), class = "data.frame")
說,我想跨列A1
和A2
實現以下更改:
NA
替換0
NA
NA
使用以下語法僅執行更改 1. 但不執行更改 2. 和 3.:
library(dplyr)
library(zoo)
df %>%
mutate(across(starts_with("A"),
~na_if(.,0),
~ifelse(. %in% boxplot(.)$out, NA, .),
~na.approx(., na.rm = FALSE, rule = 2)))
A1 A2
1 838.611 499.041
2 824.048 492.997
3 668.901 486.132
4 225.075 469.503
5 NA 476.782
6 NA 464.180
7 341.291 469.833
8 NA 462.317
9 101.652 455.507
10 127.341 441.470
11 NA 490.147
12 297.092 430.844
13 NA NA
14 NA NA
15 NA NA
16 NA NA
17 NA NA
18 764.737 NA
19 759.510 NA
20 772.210 124.068
編輯:正確的 output是從這種(重復)類型的代碼中獲得的(我想避免):
df %>%
mutate(across(starts_with("A"),
~na_if(.,0))) %>%
mutate(across(starts_with("A"),
~ifelse(. %in% boxplot(.)$out, NA, .))) %>%
mutate(across(starts_with("A"),
~na.approx(., na.rm = FALSE, rule = 2)))
A1 A2
1 838.6110 499.041
2 824.0480 492.997
3 668.9010 486.132
4 225.0750 469.503
5 263.8137 476.782
6 302.5523 464.180
7 341.2910 469.833
8 221.4715 462.317
9 101.6520 455.507
10 127.3410 441.470
11 212.2165 490.147
12 297.0920 430.844
13 375.0328 430.844
14 452.9737 430.844
15 530.9145 430.844
16 608.8553 430.844
17 686.7962 430.844
18 764.7370 430.844
19 759.5100 430.844
20 772.2100 430.844
在評論中回答OP的問題。
df %>%
mutate(
across(
starts_with("A"),
list(
~na_if(.,0),
~ifelse(. %in% boxplot(.)$out, NA, .),
~na.approx(., na.rm = FALSE, rule = 2)
)
)
)
A1 A2 A1_1 A1_2 A1_3 A2_1 A2_2 A2_3
1 838.611 499.041 838.611 838.611 838.611 499.041 499.041 499.041
2 824.048 492.997 824.048 824.048 824.048 492.997 492.997 492.997
3 668.901 486.132 668.901 668.901 668.901 486.132 486.132 486.132
4 225.075 469.503 225.075 225.075 225.075 469.503 469.503 469.503
5 0.000 476.782 NA 0.000 0.000 476.782 476.782 476.782
6 0.000 464.180 NA 0.000 0.000 464.180 464.180 464.180
7 341.291 469.833 341.291 341.291 341.291 469.833 469.833 469.833
8 0.000 462.317 NA 0.000 0.000 462.317 462.317 462.317
9 101.652 455.507 101.652 101.652 101.652 455.507 455.507 455.507
10 127.341 441.470 127.341 127.341 127.341 441.470 441.470 441.470
11 0.000 490.147 NA 0.000 0.000 490.147 490.147 490.147
12 297.092 430.844 297.092 297.092 297.092 430.844 430.844 430.844
13 0.000 0.000 NA 0.000 0.000 NA 0.000 0.000
14 0.000 0.000 NA 0.000 0.000 NA 0.000 0.000
15 0.000 0.000 NA 0.000 0.000 NA 0.000 0.000
16 0.000 0.000 NA 0.000 0.000 NA 0.000 0.000
17 0.000 0.000 NA 0.000 0.000 NA 0.000 0.000
18 764.737 0.000 764.737 764.737 764.737 NA 0.000 0.000
19 759.510 0.000 759.510 759.510 759.510 NA 0.000 0.000
20 772.210 124.068 772.210 772.210 772.210 124.068 124.068 124.068
您可以通過(除其他外)命名列表元素來為 output 列賦予更有意義的名稱:
df %>%
mutate(
across(
starts_with("A"),
list(
"Zero"=~na_if(.,0),
"BoxPlot"=~ifelse(. %in% boxplot(.)$out, NA, .),
"Approx"=~na.approx(., na.rm = FALSE, rule = 2)
)
)
)
A1 A2 A1_Zero A1_BoxPlot A1_Approx A2_Zero A2_BoxPlot A2_Approx
1 838.611 499.041 838.611 838.611 838.611 499.041 499.041 499.041
2 824.048 492.997 824.048 824.048 824.048 492.997 492.997 492.997
...
更新以響應下面的 OP 評論
cross across()
有一個.names
參數,允許控制 output 列的命名,但這在這里不起作用,因為 cross across()
為輸入列和 function 的每個組合輸出一列。 我們希望對每個輸入列應用多個函數,為每個輸入列生成一個 output 列。 為此,請將處理每一列的函數包裝在單個 function 中。 這與 OP 對原始問題的編輯中的多個mutate
調用具有相同的效果。
df %>%
mutate(
across(
starts_with("A"),
function(.x) {
.x <- na_if(.x, 0)
.x <- ifelse(.x %in% boxplot(.x)$out, NA, .x)
.x <- na.approx(.x, na.rm = FALSE, rule = 2)
.x
}
)
)
A1 A2
1 838.6110 499.041
2 824.0480 492.997
3 668.9010 486.132
4 225.0750 469.503
5 263.8137 476.782
6 302.5523 464.180
7 341.2910 469.833
8 221.4715 462.317
9 101.6520 455.507
10 127.3410 441.470
11 212.2165 490.147
12 297.0920 430.844
13 375.0328 430.844
14 452.9737 430.844
15 530.9145 430.844
16 608.8553 430.844
17 686.7962 430.844
18 764.7370 430.844
19 759.5100 430.844
20 772.2100 430.844
為了清楚起見,我編寫了一個自定義 function ,它可以應用於 cross 中的across
列。
library(dplyr)
library(zoo)
apply_fun <- function(x) {
na_if(x, 0) %>%
ifelse(. %in% boxplot(.)$out, NA, .) %>%
na.approx(., na.rm = FALSE, rule = 2)
}
df %>% mutate(across(starts_with("A"),apply_fun))
# A1 A2
#1 838.6110 499.041
#2 824.0480 492.997
#3 668.9010 486.132
#4 225.0750 469.503
#5 263.8137 476.782
#6 302.5523 464.180
#7 341.2910 469.833
#8 221.4715 462.317
#9 101.6520 455.507
#10 127.3410 441.470
#11 212.2165 490.147
#12 297.0920 430.844
#13 375.0328 430.844
#14 452.9737 430.844
#15 530.9145 430.844
#16 608.8553 430.844
#17 686.7962 430.844
#18 764.7370 430.844
#19 759.5100 430.844
#20 772.2100 430.844
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.