简体   繁体   中英

Log Transform many variables in R with loop

I have a data frame that has a binary variable for diagnosis (column 1) and 165 nutrient variables (columns 2-166) for n=237. Let's call this dataset nutr_all. I need to create 165 new variables that take the natural log of each of the nutrient variables. So, I want to end up with a data frame that has 331 columns - column 1 = diagnosis, cols 2-166 = nutrient variables, cols 167-331 = log transformed nutrient variables. I would like these variables to take the name of the old variables but with "_log" at the end

I have tried using a for loop and the mutate command, but, I'm not very well versed in r, so, I am struggling quite a bit.

for (nutr in (nutr_all_nomiss[,2:166])){
 nutr_all_log <- mutate(nutr_all, nutr_log = log(nutr) )
}

When I do this, it just creates a single new variable called nutr_log. I know I need to let r know that the "nutr" in "nutr_log" is the variable name in the for loop, but I'm not sure how.

For any encountering this page more recently, dplyr::across() was introduced in late 2020 and it is built for exactly this task - applying the same transformation to many columns all at once.

A simple solution is below.

If you need to be selective about which columns you want to transform, check out the tidyselect helper functions by running ?tidyr_tidy_select in the R console.

library(tidyverse)
# create vector of column names
variable_names <- paste0("nutrient_variable_", 1:165)

# create random data for example
data_values <- purrr::rerun(.n = 165, 
                             sample(x=100, 
                                    size=237, 
                                    replace = T)) 

# set names of the columns, coerce to a tibble, 
# and add the diagnosis column
nutr_all <- data_values %>%
    set_names(variable_names) %>%
    as_tibble() %>% 
    mutate(diagnosis = 1:237) %>% 
    relocate(diagnosis, .before = everything())
    
# use across to perform same transformation on all columns 
# whose names contain the phrase 'nutrient_variable'
nutr_all_with_logs <- nutr_all %>%
    mutate(across(
        .cols = contains('nutrient_variable'),
        .fns = list(log10 = log10),
        .names = "{.col}_{.fn}"))

# print out a small sample of data to validate 
nutr_all_with_logs[1:5, c(1, 2:3, 166:168)]

Personally, instead of adding all the columns to the data frame, I would prefer to make a new data frame that contains only the transformed values, and change the column names:

logs_only <- nutr_all %>%
    mutate(across(
        .cols = contains('nutrient_variable'),
        .fns = log10)) %>% 
    rename_with(.cols = contains('nutrient_variable'),
                .fn = ~paste0(., '_log10'))
logs_only[1:5, 1:3]

We can use mutate_at

library(dplyr)
nutr_all_log <- nutr_all_nomiss %>%
                    mutate_at(2:166, list(nutr_log = ~ log(.)))

In base R , we can do this directly on the data.frame

nm1 <- paste0(names(nutr_all_nomiss)[2:166], "_nutr_log")
nutr_all_nomiss[nm1] <- log(nutr_all_nomiss[nm1])

In base R, we can use lapply :

nutr_all_nomiss[paste0(names(nutr_all_nomiss)[2:166], "_log")] <- lapply(nutr_all_nomiss[2:166], log)

Here is a solution using only base R:

First I will create a dataset equivalent to yours:

nutr_all <- data.frame(
  diagnosis = sample(c(0, 1), size = 237, replace = TRUE)
)

for(i in 2:166){
  nutr_all[i] <- runif(n = 237, 1, 10)
  names(nutr_all)[i] <- paste0("nutrient_", i-1)
}

Now let's create the new variables and append them to the data frame:

nutr_all_log <- cbind(nutr_all, log(nutr_all[, -1]))

And this takes care of the names:

names(nutr_all_log)[167:331] <- paste0(names(nutr_all[-1]), "_log")

given function using dplyr will do your task, which can be used to get log transformation for all variables in the dataset, it also checks if the column has -ive values. currently, in this function it will not calculate the log for those parameters,

logTransformation<- function(ds)
{
  # this function creats log transformation of dataframe for only varibles which are positive in nature
  # args:
    # ds : Dataset

  require(dplyr)
  if(!class(ds)=="data.frame" ) { stop("ds must be a data frame")}

  ds <- ds %>% 
    dplyr::select_if(is.numeric)


 # to get only postive variables
  varList<- names(ds)[sapply(ds, function(x) min(x,na.rm = T))>0] 

  ds<- ds %>% 
    dplyr::select(all_of(varList)) %>% 
    dplyr::mutate_at(
         setNames(varList, paste0(varList,"_log")), log)
)
  return(ds)
}

you can use it for your case as:

#assuming your binary variable has namebinaryVar
nutr_allTransformed<- nutr_all %>% dplyr::select(-binaryVar) %>% logTransformation()

if you want to have negative variables too, replace varlist as below:

varList<- names(ds)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM