简体   繁体   中英

How to create an AnalysisEngineDescriptor from an uima-ruta script to use in a SimplePipeline

I'm not able to run an uima ruta script in my simple pipeline. I'm working with the next libraries:

  1. Uimafit 2.0.0
  2. Uima-ruta 2.0.1
  3. ClearTK 1.4.1
  4. Maven

And I'm using a org.apache.uima.fit.pipeline.SimplePipeline with:

SimplePipeline.runPipeline(
    UriCollectionReader.getCollectionReaderFromDirectory(filesDirectory), //directory with text files
    UriToDocumentTextAnnotator.getDescription(),
    StanfordCoreNLPAnnotator.getDescription(),//stanford tokenize, ssplit, pos, lemma, ner, parse, dcoref

    AnalysisEngineFactory.createEngineDescription(RUTA_ANALYSIS_ENGINE),//RUTA script

    AnalysisEngineFactory.createEngineDescription(//
        XWriter.class, 
        XWriter.PARAM_OUTPUT_DIRECTORY_NAME, outputDirectory,
        XWriter.PARAM_FILE_NAMER_CLASS_NAME, ViewURIFileNamer.class.getName())
);

What I'm trying to do is to use the StandfordNLP annotator(from ClearTK) and apply a ruta script. Currently, everything runs without errors and the default ruta annotations are being added to the CAS, but the annotations that my rules create are not being added to the CAS.

My script is:

PACKAGE edu.isistan.carcha.concern;
TYPESYSTEM org.cleartk.ClearTKTypeSystem; 
DECLARE persistence
Token{FEATURE("lemma","storage") -> MARK(persistence)};

Looking at the annotated file: 在此输入图像描述

The basic ruta annotations like "SPACE" or "SW" are there, so the RutaEngine is being created and added to the pipeline...

How do I properly create an AnalysisEngineDescriptor to run a Ruta script?

Notes: RUTA_ANALYSIS_ENGINE Its the engine descriptor that I copy from the RUTA workbench.

Try to add a semi-column after the declaration and use a fully qualified name for the Token annotation :

PACKAGE edu.isistan.carcha.concern;
TYPESYSTEM org.cleartk.ClearTKTypeSystem; 
DECLARE persistence;
org.cleartk.token.type.Token{FEATURE("lemma","storage") -> MARK(persistence)};

Type aliasing in RUTA is a little bit too aggressive. Every types known to your pipeline will be available by its short name, even if you do not import them in your script. If you have more than one Token types available to your pipeline, there is currently no way to know which one will be picked (see https://issues.apache.org/jira/browse/UIMA-3322?filter=-2 ).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM