[英]How do I apply function to specific columns
I have a sample data frame like the following:我有一个示例数据框,如下所示:
well <- c('A1','A2','A3','A4','A5')
area <- c(21000, 23400, 26800,70000,8000)
length <- c(21, 234, 26,70,22)
group<-c('WT','Control','C2','D2','E1')
data <- data.frame(well,area,length,group)
And I want to apply the function below to remove rows with outliers from the data frame:我想应用下面的 function 从数据框中删除具有异常值的行:
Outlier <- function(x){
low <- median(x, na.rm=TRUE)-5*(mad(x))
high <- median(x, na.rm=TRUE)+5*(mad(x))
out <- if_else(x > high, NA,ifelse(x < low, low, x))
out }
How do I apply this function to the dataframe excluding certain columns for example column "well" and "group"?如何将此 function 应用于 dataframe,不包括某些列,例如列“well”和“group”?
We can use lapply
in base R
我们可以在
base R
中使用lapply
data[c('area', 'length')] <- lapply(data[c('area', 'length')], Outlier)
Or with dplyr
或与
dplyr
library(dplyr) # 1.0.0
data %>%
mutate(across(area:length, Outlier))
# well area length group
#1 A1 21000 21 WT
#2 A2 23400 NA Control
#3 A3 26800 26 C2
#4 A4 NA NA D2
#5 A5 8000 22 E1
NOTE: Make sure to change the NA
to NA_real_
in the 'Outlier' function注意:确保在“异常值”function 中将
NA
更改为NA_real_
Outlier <- function(x){
low <- median(x, na.rm=TRUE)-5*(mad(x))
high <- median(x, na.rm=TRUE)+5*(mad(x))
out <- if_else(x > high, NA_real_,ifelse(x < low, low, x))
out }
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.