简体   繁体   中英

R, how to replace only the numeric values of a dataframe?

I am working on R 3.4.3 on Windows 10. I have a dataframe made of numeric values and characters. I would like to replace only the numeric values but when I do that the characters also change and are replaced.

How can I edit my function to make it affect only the numeric values and not the characters?

Here is the piece of code of my function:

dataframeChange <- function(dFrame){
  thresholdVal <- 20
  dFrame[dFrame >= thresholdVal] <- -1
  return(dFrame)
  }

Here is a dataframe example:

example_df <- data.frame(
   myNums = c (1:5), 
   myChars = c("A","B","C","D","E"),
   stringsAsFactors = FALSE
 )

Thanks for the help!

As Tim's comment, you should be aware of the location of the numeric columns which we can locate them using ind <- sapply(dFrame, is.numeric)

dataframeChange <- function(dFrame){
                    #browser()
                    thresholdVal <- 20
                    ind <- sapply(dFrame, is.numeric)
                    dFrame[(dFrame[,ind] >= thresholdVal),ind] <- -1
                    #dFrame[dFrame >= thresholdVal] <- -1
                    return(dFrame)
                  }

Use mutate_if from dplyr :

library(dplyr)

example_df %>% mutate_if(is.numeric, funs(if_else(. >= thresh, repl, .)))

  myNums myChars
1     10       A
2     -1       B
3     -1       C
4      5       D
5     -1       E

Explanation:

  • The mutate family of functions is for variable assignment or updating.
  • mutate_if functions (specified within funs() ) are only applied to columns which satisfy the first argument (in this case, is.numeric() )
  • The updating function is a simple if_else clause based on OP rules.

Data:

thresh <- 20
repl <- -1.0

example_df <- data.frame(
   myNums = c(10,20,30,5,70), 
   myChars = c("A","B","C","D","E"),
   stringsAsFactors = FALSE
 ) 

example_df
  myNums myChars
1     10       A
2     20       B
3     30       C
4      5       D
5     70       E

Using data.table , we can avoid explicit loops and is faster. Here I've set the threshold value as 2:

# set to data table
setDT(example_df)

# get numeric columns
num_cols <- names(example_df)[sapply(example_df, is.numeric)]

# loop over all columns at once
example_df[,(num_cols) := lapply(.SD, function(x) ifelse(x>2,-1, x)), .SDcols=num_cols]

print(example_df)

   myNums myChars
1:      1       A
2:      2       B
3:     -1       C
4:     -1       D
5:     -1       E

Another data.table solution.

library(data.table)

dataframeChange <- function(dFrame){
    setDT(dFrame)
    for(j in seq_along(dFrame)){
       set(dFrame, i= which(dFrame[[j]] < 20), j = j, value = -1)
    }
}

dataframeChange_dt(example_df)

example_df
#    myNums myChars
# 1:     -1       A
# 2:     20       B
# 3:     30       C
# 4:     -1       D
# 5:     70       E

It does not explicitly call only numeric columns, however I tested on multiple datasets and it does not effect the non-numeric columns.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM