I want to clean special characters from variable.
mm = data.frame(rule = c('$X <= 0', '$X > 0 & $X <= 17.5', '> 17.5 & $X <= 197.3', '$X > 197.3'))
mm$ruleclean <- gsub('\\&',' ',gsub('\\s+','',gsub('\\$X', '', mm$rule)))
Desired Output:
<=0
0 - 17.5
17.5 - 197.3
> 197.3
Objective is to convert it into intervals
Here's a verbatim attempt, certainly prone to issues. I'm using magrittr
's pipe operator %>%
for clarity of code (along the lines of @camille's comment about nested gsub
s), though it is not strictly required for function. Also, I changed the first value from 0
to -1
solely to demonstrate some ambiguity you might have with negative numbers.
mm = data.frame(rule = c('$X <= -1', '$X > 0 & $X <= 17.5', '> 17.5 & $X <= 197.3', '$X > 197.3'))
library(magrittr)
gsub("&", "", mm$rule) %>%
gsub("(\\s*\\$X\\s*)?<=?\\s*", "..", .) %>%
gsub("(\\$X\\s*)?>=?\\s*", "", .) %>%
gsub("^\\.\\.", "<=", .) %>%
gsub("(-[0-9.]+)", "(\\1)", .) %>%
gsub("\\.\\.", "-", .) %>%
gsub("^([0-9.]+)$", ">\\1", .)
# [1] "<=(-1)" "0-17.5" "17.5-197.3" ">197.3"
( Edit : corrected the last string.)
I think one way I'd do this different in the future is not hard-coding the leading <=
and >
in the first/last strings, being more robust. Furthermore, I might personally prefer the mathematic nomenclature for closed/open ends, along the lines of (,-1]
(or (-Inf,-1]
), (0,17.5]
, etc, for several reasons: clarity in the meaning, and it is consistent with R's cut
factor levels.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.