简体   繁体   中英

how can i parse a "query notation" string?

Suppose i have a service that fetches data of Items from some data object (eg an ArrayList). The service implements a method that gets a query as string, and returns a list of Items which fit the query - that simple. method signature: public List query (String query) {}

Now for the challenge: The query is built in a prefix manner, and consists of one or more of these 'SQL-like' query options: EQUAL, GREATER_THAN, LESS_THAN, AND, OR, NOT and by each of those options, a paranthesis with comma seperated field name and value (of an Item). Examples:

"EQUAL(id,\"id2\")" -> search for item with id = "id2"
"OR(EQUAL(id,\"id1\"),EQUAL(id,\"id2\"))" -> search for items with id = "id1" or id = "id2"
"GREATER_THAN(views,41)" -> search for item with views > 41

more complicated examples:

"OR(EQUAL(id,\"id1\"),AND(GREATER_THAN(views,100),EQUAL(id,\"id2\")))"

Do you know some out of the box parser? Or have any idea how to parse it?

Thanks in advance.

I'm still stuck with thinking how to parse this query in general, considering the priorities of NOT, AND, OR in particularly

It is in an unambigous format (you don't have to worry about priorities) where the first element is always the operator and the operands follow in brackets. Since nesting is possible, I would suggest a recursive approach similar to that:

record IntermediateResult(Evaluator<YourType> evaluator, int lastIndex){};
sealed interface Operand permits EvaluatedOperand, DirectOperand{}
record EvaluatedOperand(Evaluator<YourType> operand) implements Operand{}
record DirectOperand(String operand)
public IntermediateResult parse(String toParse, int startIndex){
    //find the start of the operands or the end of the operand if the it isn't an expression to parse
    //TODO handle double quotes here
    int openBrackStart=toParse.indexOf("(",startIndex);
    int nextClosedBrack=toParse.indexOf(")",startIndex);
    if(nextClosedBrack==-1){
        nextClosedBrack=toParse.length();
    }
    int nextComma=toParse.indexOf(",",startIndex);
    if(nextComma==-1){
        nextComma=toParse.length();
    }
    int operandEnd=Math.min(nextClosedBrack,nextComma);
    if(openBrackStart==-1||openBrackStart>operandEnd){
        return new IntermediateResult(null, operandEnd);//no subexpressions, it's just an operand
    }
    //there are subexpressions (inside parenthesis) - parse those
    String operator=toParse.substring(startIndex,openBrackStart);
    int numOperands=getNumberOfOperands(operator);
    Operand[] operands=new Operand[numOperands];
    int operandStart=openBrackStart+1;
    for(int i=0;i<numOperands;i++){
        //parse each operator
        IntermediateResult subExpression=parse(toParse, operandStart);
        if(subExpression.evaluator()==null){
            //operator doesn't have subexpressions
            operands[i]=new DirectOperand(toParse.substring(operandStart, subExpression.lastIndex()));
        }else{
            //operator has subexpressions
            operands[i]=new EvaluatedOperand(subExpression.evaluator());
        }
        operandStart=subExpression.lastIndex()+1;
    }
    return new IntermediateResult(getEvaluatorForOperator(operator, operands), operandStart)
}

Note that this is just a sketch/I didn't test it and I didn't check edge cases but it should give a rough idea:

It checks whether there are subexpressions. If there aren't, it returns an IntermediateResult that does not need to be parsed further ( evaluator being null ). (Alternatively, a sealed interface could be used for that.)

If there are subexpressions, it extracts the operator and then identifies the operands by parsing them recursively. If they do not need to be evaluated, they are saved as a DirectOperand and if they do, they are saved as a EvaluatedOperand . Then, it continues with the next operand. For doing that, it needs to know how many operands there are for the operator ( getNumberOfOperands() ).

After the expression is parsed, a new IntermediateResult is returned containing an object with information how to evaluate the expression (from getEvaluatorForOperator ) and the end of the parsed expression.

For simplicity, I haven't handled invalid query strings or double quotes (it's just a sketch as noted earlier) but it shouldn't be too difficult to add that.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM