简体   繁体   中英

Pattern greater less than sign

I want to clean special characters from variable.

mm = data.frame(rule = c('$X <= 0', '$X > 0 & $X <= 17.5', '> 17.5 & $X <= 197.3', '$X > 197.3'))
mm$ruleclean <- gsub('\\&',' ',gsub('\\s+','',gsub('\\$X', '', mm$rule)))

Desired Output:

<=0
0 - 17.5
17.5 - 197.3
> 197.3

Objective is to convert it into intervals

Here's a verbatim attempt, certainly prone to issues. I'm using magrittr 's pipe operator %>% for clarity of code (along the lines of @camille's comment about nested gsub s), though it is not strictly required for function. Also, I changed the first value from 0 to -1 solely to demonstrate some ambiguity you might have with negative numbers.

mm = data.frame(rule = c('$X <= -1', '$X > 0 & $X <= 17.5', '> 17.5 & $X <= 197.3', '$X > 197.3'))

library(magrittr)
gsub("&", "", mm$rule) %>%
  gsub("(\\s*\\$X\\s*)?<=?\\s*", "..", .) %>%
  gsub("(\\$X\\s*)?>=?\\s*", "", .) %>%
  gsub("^\\.\\.", "<=", .) %>%
  gsub("(-[0-9.]+)", "(\\1)", .) %>%
  gsub("\\.\\.", "-", .) %>%
  gsub("^([0-9.]+)$", ">\\1", .)
# [1] "<=(-1)"     "0-17.5"     "17.5-197.3" ">197.3"     

( Edit : corrected the last string.)

I think one way I'd do this different in the future is not hard-coding the leading <= and > in the first/last strings, being more robust. Furthermore, I might personally prefer the mathematic nomenclature for closed/open ends, along the lines of (,-1] (or (-Inf,-1] ), (0,17.5] , etc, for several reasons: clarity in the meaning, and it is consistent with R's cut factor levels.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM