简体   繁体   中英

applying functions over rows

I am using the R programming language. I am interested in seeing if it is possible to apply a function over the whole row.

for instance, suppose I have a data frame like this:

var_1 <- rnorm(10000,1,4)
var_2<-rnorm(10000,10,5)
var_3 <- sample( LETTERS[1:4], 10000, replace=TRUE, prob=c(0.1, 0.2, 0.65, 0.05) )
response_variable <- sample( LETTERS[1:2], 10000, replace=TRUE, prob=c(0.4, 0.6) )


#put them into a data frame called "f"
f <- data.frame(var_1, var_2, var_3, response_variable)

#declare var_3 and response_variable as factors
f$response_variable = as.factor(f$response_variable)
f$var_3 = as.factor(f$var_3)

(in base R) Is it possible to write a command: "select rows where the value of "var_1" and "var_2" are both greater than 2? (ie select rows where the minimum value in that row is greater than 2). I could individually write an "if else" statement, but suppose there are many columns - is it possible to do this without specifying every column?

The same way - is it possible to apply a function on multiple columns at the same time?

Suppose there is the following function:

ihs <- function(x) {
    y <- log(x + sqrt(x ^ 2 + 1))
    return(y)
}

I could write:

f$var_1 = ihs(f$var_1)
f$var_2 = ihs(f$var_2)

But is there a quicker way (when there are more columns) to apply the function "ihs" on the whole table (where applicable)?

In base R , this can be done with lapply after subsetting the columns of interest ('nm1')

nm1 <- grep('^var_\\d+$', names(f), value = TRUE)
f[nm1] <- lapply(f[nm1], ihs)

If the function needs to be applied based on type ie numeric columns and also to apply the function only if the min value is greater than 2

i1 <- sapply(f, is.numeric)
i2 <- do.call(pmin, f[i1]) > 2

or this can be done using rowSums as well

i2 <- rowSums(f[i1] >2) == length(i1)
f[i2, i1] <- lapply(f[i2, i1], ihs)

Or if we want to use tidyverse

library(dplyr)
f <- f %>%
         mutate(across(where(is.numeric), ihs))

To select rows where var1 and var2 are greater than 2 you can do:

subset(f, var_1 > 2 & var_2 > 2)

The same code can be applied in dplyr::filter

dplyr::filter(f, var_1 > 2 & var_2 > 2)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM