简体   繁体   中英

Calculate variable stored as character in R

R: How to calculate a variable, which is stored as character?

I want to get a solution as a vector of numeric values. However, when reading my df from a csv, all the elements of the df, which contain a mix of characters and numbers (those characters are to be substituted with certain values when needed) are converted to characters. Any idea how to avoid/solve that?

This code below just simulates my problem:

#create two vectors and bind them into a df
c1 <- c("v-3", "v")
c2 <- c("1-v",0)
df <- data.frame(c1,c2)
df
   c1  c2
1 v-3 1-v
2   v   0
#I would like to substitute "v" with a number
v <- 2
df
   c1  c2
1 v-3 1-v
2   v   0

Now, how can I revert the class of the the elements of the df, so that the "v" can be substituted, and the values calculated? Or maybe I can read csv in such a way that my mix of characters and numbers would be stored in a more friendly way?

Thanks in advance. Greg

You can use str_replace and then map eval / parse to evaluate the expression.


library(dplyr)
library(rlang)

df %>%
  mutate(
    across(everything(), str_replace, "v", "2"),
    across(everything(), ~map_dbl(., function(to_eval) eval(parse(text=to_eval))))
    )

  c1 c2
1 -1 -1
2  2  0

This might be a more efficient way to do what you're after:

Write a small function that:

  1. Uses gsub to replace the letter with a value.
  2. Writes the result to a tempfile
  3. Parses the tempfile
  4. Evaluates the values and inserts them back into the structure of your original data.frame .

Here's the function:

fun <- function(df, patt, repl, fixed = TRUE) {
  fil <- tempfile()
  writeLines(gsub(patt, repl, as.matrix(df), fixed = fixed), con = fil)
  df[] <- sapply(parse(fil), eval)
  df
}

Here's how you'd use the function:

fun(df, "v", 2)
##   c1 c2
## 1 -1 -1
## 2  2  0

Here's a timing comparison with the other answer, with a larger dataset.

options <- c("v-3", "v", "v*2", "1-v", "v/5", 0, "v+2")
nrow <- 10000
ncol <- 20
set.seed(1)
df <- data.frame(matrix(sample(options, nrow*ncol, TRUE),
                        nrow = nrow, ncol = ncol))

fun2 <- function(df, patt, repl) {
  # df = input data.frame
  # patt = pattern to search for
  # repl = replacement value (as character)
  df %>%
    mutate(
      across(everything(), str_replace, patt, repl),
      across(everything(), ~map_dbl(., function(to_eval) eval(parse(text=to_eval))))
    )
}

library(microbenchmark)
microbenchmark(fun(df, "v", 2), fun2(df, "v", "2"), times = 10)
# Unit: milliseconds
#                expr      min        lq     mean   median       uq      max neval cld
#     fun(df, "v", 2)  831.731  924.9648 1159.544 1012.590 1366.072 1882.586    10  a 
#  fun2(df, "v", "2") 4471.800 4721.3587 4847.252 4853.269 4959.595 5157.823    10   b

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM