简体   繁体   中英

Mutliply several columns of a dataframe by a factor (scalar)

I have a very basic problem and can't find a solution, so sorry in advance for the beginner question.

I have a data frame with several ID columns and 30 numerical columns. I want to multiply all values of those 30 columns with the same factor. I want to keep the the rest of the data frame unchanged. I figured that dplyr and transmute_all or transmute_at are my friends, but I can't find a way to express the function Column1:Column30 * factor . All examples given use simple functions like mean and that doesn't help me with the expression.

I would use mutate_at . For example:

library(dplyr)

mtcars %>% 
  mutate_at(vars(mpg:qsec),
            .funs = funs(. * 3))

I'll give a solution with data.table , the dplyr version should be close to identical.

library(data.table)
# convert to data.table format to use data.table syntax
setDT(my_df) 

# .SD refers to all the columns mentioned in the .SDcols argument
#   (all columns by default when this argument is not specified)
# - instead of using backticks around *, you could use quotes: "*"
my_df[ , lapply(.SD, `*`, factor), .SDcols = Column1:Column30]

On some made-up data

set.seed(0123498)
# create fake data
DT = setDT(replicate(8, rnorm(5), simplify = FALSE))
DT
#            V1          V2         V3          V4         V5         V6        V7         V8
# 1: -0.2685077 -1.06491111  0.7307661  0.09880937  0.2791274 -0.5589676 1.5320685  0.4730013
# 2:  1.0783236 -0.17810929 -0.2578453  0.95940860  1.0990367 -0.6983235 0.9530062 -1.3800769
# 3:  1.1730611 -0.48828441 -1.6314077 -0.76117268 -0.5753245 -0.7370099 0.3982160 -0.8088035
# 4:  0.2060451 -0.07105785 -1.1878591 -0.83464592  2.1872117 -0.4390479 0.1428239  1.2634280
# 5:  1.6142695  0.46381602  0.5315299  2.34790945 -1.2977851  1.0428450 1.9292390  0.5337248
scalar = 3
DT[ , lapply(.SD, "*", scalar), .SDcols = V4:V6]
#            V4         V5        V6
# 1:  0.2964281  0.8373822 -1.676903
# 2:  2.8782258  3.2971101 -2.094970
# 3: -2.2835180 -1.7259734 -2.211030
# 4: -2.5039378  6.5616352 -1.317144
# 5:  7.0437283 -3.8933554  3.128535

May be this will help you, just R base

> set.seed(100)
> df = data.frame(id=rep(1:5), val1=rnorm(5), val2=rnorm(5), val3=rnorm(5))
> df

  id        val1       val2        val3
1  1 -0.50219235  0.3186301  0.08988614
2  2  0.13153117 -0.5817907  0.09627446
3  3 -0.07891709  0.7145327 -0.20163395
4  4  0.88678481 -0.8252594  0.73984050
5  5  0.11697127 -0.3598621  0.12337950

# Multiply by 2 all columns except id column
> df[, !colnames(df) %in% c("id")] <- df[, !colnames(df) %in% c("id")] * 2
> df
  id       val1       val2       val3
1  1 -1.0043847  0.6372602  0.1797723
2  2  0.2630623 -1.1635814  0.1925489
3  3 -0.1578342  1.4290654 -0.4032679
4  4  1.7735696 -1.6505189  1.4796810
5  5  0.2339425 -0.7197243  0.2467590
> 

If it's all numeric columns you want to multiply, (or if you can easily write a test) I'd use lapply with an is.numeric test:

Calling the data frame dd (and using iris to demonstrate):

dd = iris
dd[] = lapply(dd, FUN = function(x) if (is.numeric(x)) return(x * 2) else return(x))

This is equivalent to a simple for loop, which also works just fine.

for (i in 1:ncol(dd)) {
    if (is.numeric(dd[[i]])) dd[[i]] = dd[[i]] * 2
}

You could just use apply

my_df <- data_frame(//some data)

my_scaled_df <- apply(data_frame, 2, transformation_logic)

For this you can use try:

y <- xx[-(1:2)]*100

this "xx[-(1:2)]" is non numeric columns so you need to exclude these from the calculation.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM