简体   繁体   中英

Parse Drools rule file with Java regex

I'm interested in parsing a Drools rule file using regular expressions. Having a string with the content of the whole .drl file, I'd like to have 4 substrings:

  1. A substring with the content of <name>
  2. A substring with the content of <attribute>
  3. A substring with the content of <conditional element>
  4. A substring with the content of <action>

A Drools rule has the following structure, according to the official documentation :

rule "<name>"
    <attribute>*
when
    <conditional element>*
then
    <action>*
end

I've tried using this pattern, but it hasn't worked well:

^rule"(.|\n|\r|\t)+"(.|\n|\r|\t)+\bwhen\b(.|\n|\r|\t)+\bthen\b(.|\n|\r|\t)+\bend\b?$

Does anyone have an idea of how could I proceed?

I know your question is about regexp, but I would strongly advise against using it. There are way too many cases that will fail with your regexp... for instance, rule names that are a single word don't need "", rule keyword does not need to be the first thing in the line, etc...

/*this is a comment on the start of the line*/ rule X...

Instead of regexp, just use the DrlParser directly and it will give you all the information you need:

String drl = "package foo \n"
                 + "declare Bean1 \n"
                 + "field1: java.math.BigDecimal \n"
                 + "end \n"
                 + "rule bigdecimal\n"
                 + "when \n"
                 + "Bean1( field1 == 0B ) \n"
                 + "then \n"
                 + "end";

DrlParser parser = new DrlParser(LanguageLevelOption.DRL6);
PackageDescr pkgDescr = parser.parse( null, drl );

PackageDescr.getRules() will give you all the RuleDescr in the file, each RuleDescr has a getName() to give you the rule name, etc. All type safe, no edge cases, etc.

You almost got it. This work:

^rule\s+\"(.|\n|\r|\t)+\"(.|\n|\r|\t)+\bwhen\b(.|\n|\r|\t)+\bthen\b(.|\n|\r|\t)+\bend\b?$

Another solution:

^\s*rule\s+\"([^\"]+)\"[\s\S]+\s+when\s+([\s\S]+)\s+then\s+([\s\S]+)\send\s*$

Note: You missed the space and " -> \\"

Tips:

  • You can use \\s for white space charcters.
  • [^\\"] for all non " character.
  • [\\s\\S] for all characters.
  • \\b stop at [a-zA-Z0-9_] . \\s+ stop at any non-whitespace character. It is just an extra precaution if any attribute start with a special character.
  • Use a program like Rad Software Regular Expression Designer. That will dramatically simplify editing and testing your regex code.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM