简体   繁体   中英

How do I pre-determine mutually exclusive comparisons?

The human eye can see that no value x satisfies the condition

x<1 & x>2

but how can I make R see that. I want to use this in a function which gets passed comparisons (say as strings) and not necessarily data. Let's say I want to write a function that checks whether a combination of comparisons can ever be fulfilled anyway, like this

areTherePossibleValues <- function(someString){
    someCode
}

areTherePossibleValues("x<1 & x>2")
[1] FALSE

I mean one could do that by interpreting the substrings that are comparison signs and so on, but I feel like there's got to be a better way. The R comparison functions ('<','>','=' and so on) themselves actually might be the answer to this, right?

Another option is to use the library validatetools (disclaimer, I'm its author).

library(validatetools)

rules <- validator( r1 = x < 1, r2 = x > 2)
is_infeasible(rules)
# [1] TRUE

make_feasible(rules)
# Dropping rule(s): "r1"
# Object of class 'validator' with 1 elements:
#  r2: x > 2
# Rules are evaluated using locally defined options

# create a set of rules that all must hold:
rules <- validator( x > 1, x < 2, x < 2.5)
is_infeasible(rules)
# [1] FALSE

remove_redundancy(rules)
# Object of class 'validator' with 2 elements:
#  V1: x > 1
#  V2: x < 2

rules <- validator( x >= 1, x < 1)
is_infeasible(rules)
# [1] TRUE

To compare among ranges, min of the range max(s) should always be greater than the max of the range min(s), showed as below:

library(dplyr)

library(stringr)

areTherePossibleValues <- function(s) {

  str_split(s, pattern = " *& *", simplify = TRUE)[1, ] %>% 

    {lapply(c("max" = "<", "min" = ">"), function(x) str_subset(., pattern = x) %>% str_extract(., pattern = "[0-9]+"))} %>% 

    {as.numeric(min(.$max)) > as.numeric(max(.$min))}

}

Update: add inclusion comparison

The only difference is that min of the range max(s) can be equal to the max of the range min(s).

library(dplyr)

library(stringr)

areTherePossibleValues <- function(s) {

  str_split(s, pattern = " *& *", simplify = TRUE)[1, ] %>% 

    {lapply(c("max" = "<", "min" = ">"), function(x) str_subset(., pattern = x) %>% str_remove(., pattern = paste0("^.*", x)))} %>% 

    {ifelse(sum(grepl(pattern = "=", unlist(.))), 

            as.numeric(min(str_remove(.$max, "="))) >= as.numeric(max(str_remove(.$min, "="))), 

            as.numeric(min(.$max)) > as.numeric(max(.$min)))}

}

areTherePossibleValues("x<1 & x>2")

areTherePossibleValues("x>1 & x<2")

areTherePossibleValues("x>=1 & x<1")

Here is my way of solving it, it may not be the best, but it should work even you have many comparisons.

Let's call the numbers appeared in your comparisons 'cutoffs', then all we need to do is to test 1 number between each pair of cutoffs, 1 number that is larger than the max cutoff, and 1 number that is smaller than the min cutoff.

The intuition is illustrated with the plot:

在此处输入图片说明

Here is the code:

areTherePossibleValues <- function(s){

  # first get the numbers that appeared in your string, sort them, and call them the cutoffs
  cutoffs = sort(as.numeric(gsub("\\D", "", strsplit(s,  "&")[[1]])))

  # get the numbers that in between each cutoffs, and a bit larger/smaller than the max/min in the cutoffs
  testers = (c(min(cutoffs)-1, cutoffs) + c( cutoffs ,max(cutoffs) + 1))/2

  # take out each comparisons
  comparisons = strsplit(s,  "&")[[1]]

  # check if ANY testers statisfy all comparisons
  any(sapply(testers, function(te){

    # check if a test statisfy ALL comparisons
    all(sapply(comparisons, function(co){eval(parse(text =gsub(pattern = 'x',replacement =te, co)))}))
  }))
}

areTherePossibleValues("x<1 & x>2")
#[1] FALSE

areTherePossibleValues("x>1 & x<2 & x < 2.5")
#[1] TRUE

areTherePossibleValues("x=> 1 & x < 1")
#[1] FALSE

FUNCTION

The function will process combinations or list of combinations of comparisons that contain < or > , >= , <= , or = (or == ).

areTherePossibleValues = function(condition = "x<1 & x>2", tolerance = 1e-10){

    #Attach all comparison into one
    condition = paste(condition, collapse = "&")

    #PARSE
    condition = tolower(condition) #make everything lowercase just in case    
    condition = gsub("[ ,x]","",condition) #Remove whitespace and 'x'    
    condition = gsub(">=","g",condition) # >= to g(reater than or equal to)
    condition = gsub("<=","s",condition) # <= to s(maller than or equal to) 
    condition = gsub("[==,=]","e",condition) # == or = to e(qual)

    #Separate conditions into a list
    condition = unlist(strsplit(condition,"&"))

    #Initiate vector of upper and lower bounds with NA
    Upper = rep(x = NA, times = length(condition))
    Lower = rep(x = NA, times = length(condition))

    #Fill the vector of upper and lower bounds based on comparators and numbers
    for (i in 1:length(condition)){
        number = as.numeric(gsub(pattern = "[<,>,e,g,s]", replacement = "", condition[i]))    
        comparator = substr(condition[i], start = 1, stop = 1)
        if (comparator == ">"){
            Lower[i] = number + tolerance   #just to the right of the number so as to exclude it
        } else if (comparator == "<"){
            Upper[i] = number - tolerance   #just to the left of the number so as to exclude it
        } else if (comparator == "g"){
            Lower[i] = number           #Include the number
        } else if (comparator == "s"){
            Upper[i] = number           #Include the number
        } else if (comparator == "e"){
            Upper[i] = number           #For =, make upper and lower bounds same
            Lower[i] = number
        }
    }

    Upper = as.numeric(Upper[which(is.na(Upper) == FALSE)]) #Remove NAs
    Lower = as.numeric(Lower[which(is.na(Lower) == FALSE)]) #Remove NAs

    if (length(Upper) == 0 & length(Lower) > 0){
        #1. If Upper has 0 length and Lower has more than 0, it means
        # x is constrained only by lower bounds. x will always be fulfilled
        ans = TRUE
    } else if (length(Lower) == 0 & length(Upper) > 0){
        #2. If Lower has 0 length and Upper has more than 0, it means
        # x is constrained only by upper bounds. x will always be fulfilled
        ans = TRUE
    } else {
        # If the smallest upper bound is bigger than the largest lower bound,
        #x will be fulfilled.
        ans = (min(Upper) - max(Lower)) >= 0
    }

    if (ans == FALSE){
    return(ans)
    } else {
    return(paste(ans," for (",max(Lower)," < x < ",min(Upper),")",sep = ""))
    }
}

USAGE

areTherePossibleValues(">=5 & <50 & >30 & >45")
#[1] "TRUE for (45.0000000001 < x < 49.9999999999)"

areTherePossibleValues("x>5 & x<3")
#[1] FALSE

areTherePossibleValues(c("<5",">=2 & =4"))
#[1] "TRUE for (4 < x < 4)"

We see x<1 & x>2 is impossible because we are taught a simple rule: if a number x is smaller than another number a then it can not be bigger than another number that is bigger than a , or more fundamentally we are using the transitivity property of any partially ordered set. There is no reason we can not teach a computer (or R) to see that. If your logic string in your question only consists of statements in the forms x # a where # can be <, >, <=, and >=, and the operator is always &, then Yue Y's solution above perfectly answers your question. It can be even generalized to include the | operator. Beyond this you'll have to be more specific what the logic expression can be.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM