I want to get a solution as a vector of numeric values. However, when reading my df from a csv, all the elements of the df, which contain a mix of characters and numbers (those characters are to be substituted with certain values when needed) are converted to characters. Any idea how to avoid/solve that?
This code below just simulates my problem:
#create two vectors and bind them into a df
c1 <- c("v-3", "v")
c2 <- c("1-v",0)
df <- data.frame(c1,c2)
df
c1 c2
1 v-3 1-v
2 v 0
#I would like to substitute "v" with a number
v <- 2
df
c1 c2
1 v-3 1-v
2 v 0
Now, how can I revert the class of the the elements of the df, so that the "v" can be substituted, and the values calculated? Or maybe I can read csv in such a way that my mix of characters and numbers would be stored in a more friendly way?
Thanks in advance. Greg
You can use str_replace
and then map eval
/ parse
to evaluate the expression.
library(dplyr)
library(rlang)
df %>%
mutate(
across(everything(), str_replace, "v", "2"),
across(everything(), ~map_dbl(., function(to_eval) eval(parse(text=to_eval))))
)
c1 c2
1 -1 -1
2 2 0
This might be a more efficient way to do what you're after:
Write a small function that:
gsub
to replace the letter with a value.data.frame
.Here's the function:
fun <- function(df, patt, repl, fixed = TRUE) {
fil <- tempfile()
writeLines(gsub(patt, repl, as.matrix(df), fixed = fixed), con = fil)
df[] <- sapply(parse(fil), eval)
df
}
Here's how you'd use the function:
fun(df, "v", 2)
## c1 c2
## 1 -1 -1
## 2 2 0
Here's a timing comparison with the other answer, with a larger dataset.
options <- c("v-3", "v", "v*2", "1-v", "v/5", 0, "v+2")
nrow <- 10000
ncol <- 20
set.seed(1)
df <- data.frame(matrix(sample(options, nrow*ncol, TRUE),
nrow = nrow, ncol = ncol))
fun2 <- function(df, patt, repl) {
# df = input data.frame
# patt = pattern to search for
# repl = replacement value (as character)
df %>%
mutate(
across(everything(), str_replace, patt, repl),
across(everything(), ~map_dbl(., function(to_eval) eval(parse(text=to_eval))))
)
}
library(microbenchmark)
microbenchmark(fun(df, "v", 2), fun2(df, "v", "2"), times = 10)
# Unit: milliseconds
# expr min lq mean median uq max neval cld
# fun(df, "v", 2) 831.731 924.9648 1159.544 1012.590 1366.072 1882.586 10 a
# fun2(df, "v", "2") 4471.800 4721.3587 4847.252 4853.269 4959.595 5157.823 10 b
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.