Parse Drools rule file with Java regex

Question

I'm interested in parsing a Drools rule file using regular expressions. Having a string with the content of the whole .drl file, I'd like to have 4 substrings:

A substring with the content of <name>
A substring with the content of <attribute>
A substring with the content of <conditional element>
A substring with the content of <action>

A Drools rule has the following structure, according to the official documentation :

rule "<name>"
    <attribute>*
when
    <conditional element>*
then
    <action>*
end

I've tried using this pattern, but it hasn't worked well:

^rule"(.|\n|\r|\t)+"(.|\n|\r|\t)+\bwhen\b(.|\n|\r|\t)+\bthen\b(.|\n|\r|\t)+\bend\b?$

Does anyone have an idea of how could I proceed?

Answer 1

I know your question is about regexp, but I would strongly advise against using it. There are way too many cases that will fail with your regexp... for instance, rule names that are a single word don't need "", rule keyword does not need to be the first thing in the line, etc...

/*this is a comment on the start of the line*/ rule X...

Instead of regexp, just use the DrlParser directly and it will give you all the information you need:

String drl = "package foo \n"
                 + "declare Bean1 \n"
                 + "field1: java.math.BigDecimal \n"
                 + "end \n"
                 + "rule bigdecimal\n"
                 + "when \n"
                 + "Bean1( field1 == 0B ) \n"
                 + "then \n"
                 + "end";

DrlParser parser = new DrlParser(LanguageLevelOption.DRL6);
PackageDescr pkgDescr = parser.parse( null, drl );

PackageDescr.getRules() will give you all the RuleDescr in the file, each RuleDescr has a getName() to give you the rule name, etc. All type safe, no edge cases, etc.

Answer 2

You almost got it. This work:

^rule\s+\"(.|\n|\r|\t)+\"(.|\n|\r|\t)+\bwhen\b(.|\n|\r|\t)+\bthen\b(.|\n|\r|\t)+\bend\b?$

Another solution:

^\s*rule\s+\"([^\"]+)\"[\s\S]+\s+when\s+([\s\S]+)\s+then\s+([\s\S]+)\send\s*$

Note: You missed the space and " -> \\"

Tips:

You can use \\s for white space charcters.
[^\\"] for all non " character.
[\\s\\S] for all characters.
\\b stop at [a-zA-Z0-9_] . \\s+ stop at any non-whitespace character. It is just an extra precaution if any attribute start with a special character.
Use a program like Rad Software Regular Expression Designer. That will dramatically simplify editing and testing your regex code.

Parse Drools rule file with Java regex

Question

2 answers

solution1
5 2014-03-07 16:46:04

solution2
2 ACCPTED 2014-03-07 14:48:46

Parse Drools rule file with Java regex

Question

2 answers

solution1 5 2014-03-07 16:46:04

solution2 2 ACCPTED 2014-03-07 14:48:46

solution1
5 2014-03-07 16:46:04

solution2
2 ACCPTED 2014-03-07 14:48:46