简体   繁体   English

如何将 function 应用于特定列

[英]How do I apply function to specific columns

I have a sample data frame like the following:我有一个示例数据框,如下所示:

well <- c('A1','A2','A3','A4','A5')
area <- c(21000, 23400, 26800,70000,8000)
length <- c(21, 234, 26,70,22)
group<-c('WT','Control','C2','D2','E1')

data <- data.frame(well,area,length,group)

And I want to apply the function below to remove rows with outliers from the data frame:我想应用下面的 function 从数据框中删除具有异常值的行:

Outlier <- function(x){
  low <- median(x, na.rm=TRUE)-5*(mad(x)) 
  high <- median(x, na.rm=TRUE)+5*(mad(x))   
  out <- if_else(x > high, NA,ifelse(x < low, low, x)) 
  out }

How do I apply this function to the dataframe excluding certain columns for example column "well" and "group"?如何将此 function 应用于 dataframe,不包括某些列,例如列“well”和“group”?

We can use lapply in base R我们可以在base R中使用lapply

data[c('area', 'length')] <- lapply(data[c('area', 'length')], Outlier)

Or with dplyr或与dplyr

library(dplyr) # 1.0.0
data %>% 
     mutate(across(area:length, Outlier))
#    well  area length   group
#1   A1 21000     21      WT
#2   A2 23400     NA Control
#3   A3 26800     26      C2
#4   A4    NA     NA      D2
#5   A5  8000     22      E1

NOTE: Make sure to change the NA to NA_real_ in the 'Outlier' function注意:确保在“异常值”function 中将NA更改为NA_real_

Outlier <- function(x){
  low <- median(x, na.rm=TRUE)-5*(mad(x)) 
  high <- median(x, na.rm=TRUE)+5*(mad(x))   
  out <- if_else(x > high, NA_real_,ifelse(x < low, low, x)) 
  out }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何将 function 应用于 dataframe 中的特定列并替换原始列? - How do I apply a function to specific columns in a dataframe and replace the original columns? 如何创建函数然后将其应用于小标题的某些列? - How do I creathe then apply function over certain columns of a tibble? 如何将特定列应用于 R 中的 sapply 函数? - How to apply specific columns to sapply function in R? 在R中的特定列上应用功能 - Apply function on specific columns in R 在具有特定名称的列上应用函数 - apply a function on columns with specific names R如何对时间序列对象使用apply函数并将日期附加到特定列? - R How can I use the apply function to a time series object and keep the dates attached to the specific columns? 如何按特定的列值拆分数据帧,然后将函数应用于数据集中的列? - How do I split a data frame by a specific column value, and then apply functions to columns within the data set? 如何应用函数来改变特定的列组合? (purrr :: 使用首选) - How to apply a function to mutate a specific combination of columns? (purrr:: use preferred) 如何在名称中包含特定字符串的一系列列上应用相同的函数? - How to apply the same function over a series of columns with a specific string in their names? 如何根据列名将功能应用于特定列? - How to apply function to specific columns based upon column name?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM