简体   繁体   中英

Empty action causes 'found no viable alternative'

I am trying to write some Ruta rules that create a Time annotation around dates. The test below shows how I am trying to do it.

@Test
public void test__Ruta__AnnotateDate() throws UIMAException, IOException, URISyntaxException {
    final class RulesRunner {
        public void applyRules(JCas cas, String[] rules) throws AnalysisEngineProcessException, InvalidXMLException, ResourceInitializationException, ResourceConfigurationException, IOException, URISyntaxException {
            for (String aRule: rules) {
                Ruta.apply(cas.getCas(), aRule);
            }
        }
    }

    RulesRunner runner = new RulesRunner();

    JCas cas = JCasFactory.createJCas();
    cas.setDocumentText("Today's date is 2017-04-06.");

    // Tokenize the string
    String[] rules = new String[] {
            "ANY{REGEXP(\"[a-zA-Z0-9]+\") -> Token};",
            "ANY{REGEXP(\"[^ a-zA-Z0-9]+\") -> Token};"
    };
    runner.applyRules(cas, rules);

    rules = new String[] {
        // Does not crash, but gives:
        //   Got Time=2017-04-06
        //   Got Time=-
        //   Got Time=04
        //   Got Time=-
        //   Got Time=06
        //  
        "Token{REGEXP(\"[0-9]{4}\") -> MARK(Time, 1, 5)} Token{REGEXP(\"-\") -> Time} Token{REGEXP(\"[0-9]{2}\") -> Time} Token{REGEXP(\"-\") -> Time} Token{REGEXP(\"[0-9]{2}\") -> Time};"

        // Crashes with exception
        //
        //   org.apache.uima.ruta.extensions.RutaParseRuntimeException: 
        //     Error in Ruta7969125931572676994,  line 1, "}": found no viable alternative
        //
        // "Token{REGEXP(\"[0-9]{4}\") -> MARK(Time, 1, 5)} Token{REGEXP(\"-\") -> } Token{REGEXP(\"[0-9]{2}\") -> } Token{REGEXP(\"-\") -> } Token{REGEXP(\"[0-9]{2}\") -> };"

    };
    runner.applyRules(cas, rules);

    for (Time aTime: JCasUtil.select(cas, Time.class)) {
        System.out.println("Got Time="+aTime.getCoveredText());
    }
}

The test first annotates the tokens, then tries to put a Time annotation around any sequence of tokens of the form ['YYYY', '-', 'MM', '-', 'DD'].

I tried two rules to do this. The first rule "sort of works" in the sense that a Time annotation is indeed put around the whole sequence of Tokens. But it also adds Time annotations around each constituent of the date (except the YYYY part).

In the second rule, I tried to use an empty action for the consequence of the matches for the other Tokens, but that causes a 'found no viable alternative' exception. Aren't empty actions allowed in Ruta? If not, how would I go about putting a single annotation around the sequence of date tokens?

Thx.

Rule elements without actions are allowed. The complete part including the arrow -> needs to be ommitted (quotes escaped):

Token{REGEXP(\"[0-9]{4}\") -> MARK(Time, 1, 5)} Token{REGEXP(\"-\")} Token{REGEXP(\"[0-9]{2}\")} Token{REGEXP(\"-\")} Token{REGEXP(\"[0-9]{2}\")};

In your first rule, the additional Time annotations for each token are created by the action at the rule elements. If you remove them, you end up at the second rule as you did.

I'd recommend to optimize your rule a bit, eg, with some -PARTOF(Time) in order to avoid overlapping annotations.

I would have written the rule something like (no escaped quotes):

(NUM{-PARTOF(Time),REGEXP(".{4}")}
 SPECIAL.ct=="-"
 NUM{REGEXP(".{2}")}
 SPECIAL.ct=="-"
 NUM{REGEXP(".{2}")}
){-> Time};

If you use these rule elements in several rules, I'd refactor it to separate annotations, eg, Dash, Num4 and Num2.

DISCLAIMER: I am a developer of UIMA Ruta

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM