![](/img/trans.png)
[英]dplyr filter using qdap::which_misspelt OR dplyr filter with a nested function
[英]which() function in filter() with dplyr
我正在嘗試過濾數據集,然后將異常值設置為平均值。 示例數據框:
structure(list(INDEX = c(1, 2, 3, 4, 5, 6), TARGET_WINS = c(39,
70, 86, 70, 82, 75), TEAM_BATTING_H = c(1445, 1339, 1377, 1387,
1297, 1279), TEAM_BATTING_2B = c(194, 219, 232, 209, 186, 200
), TEAM_BATTING_3B = c(39, 22, 35, 38, 27, 36), TEAM_BATTING_HR = c(13,
190, 137, 96, 102, 92), TEAM_BATTING_BB = c(143, 685, 602, 451,
472, 443), TEAM_BATTING_SO = c(842, 1075, 917, 922, 920, 973),
TEAM_BASERUN_SB = c(NA, 37, 46, 43, 49, 107), TEAM_BASERUN_CS = c(NA,
28, 27, 30, 39, 59), TEAM_BATTING_HBP = c(NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_), TEAM_PITCHING_H = c(9364,
1347, 1377, 1396, 1297, 1279), TEAM_PITCHING_HR = c(84, 191,
137, 97, 102, 92), TEAM_PITCHING_BB = c(927, 689, 602, 454,
472, 443), TEAM_PITCHING_SO = c(5456, 1082, 917, 928, 920,
973), TEAM_FIELDING_E = c(1011, 193, 175, 164, 138, 123),
TEAM_FIELDING_DP = c(NA, 155, 153, 156, 168, 149)), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
使用dplyr
,我過濾異常值,然后嘗試根據更正后的(非異常值)均值來改變 TEAM_FIELDING_E 列:
train %>%
filter(which(boxplot.stats(train$TEAM_FIELDING_E)$out %in% train$TEAM_FIELDING_E, arr.ind = TRUE) == TRUE) %>%
mutate(
TEAM_FIELDING_E = NA,
TEAM_FIELDING_E = mean(train$TEAM_FIELDING_E)
)
這將返回錯誤Error in filter_impl(.data, quo) : Result must have length 2276, not 303
(原始數據集包含 303 個TEAM_FIELDING_E
異常值和 2276 行)。 我如何利用filter()
這樣我的mutate()
只會影響那些過濾的行?
在dplyr
動詞中,使用裸變量名而不是使用[[
或$
。 此外,如果您嘗試過濾某個值,您可以直接過濾該值,而不是嘗試使用which
來確定匹配的位置。
對於這種情況,您可以使用mutate
的if_else
獲得所需的內容。
out <- boxplot.stats(train$TEAM_FIELDING_E)$out
train %>%
mutate(TEAM_FIELDING_E = if_else(TEAM_FIELDING_E %in% out, mean(TEAM_FIELDING_E[!(TEAM_FIELDING_E %in% out)]), TEAM_FIELDING_E))
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.