简体   繁体   中英

How to remove UIMA annotations?

I'm using some UIMA annotators in a pipeline. It run tasks like:

  • tokenizer
  • sentence splitter
  • gazetizer
  • My Annotator

The problem is that I don't want to write ALL the annotations (Token, Sentence, SubToken, Time, myAnnotations, etc..) to the disk because the files gets very large quicky.

I want to remove all the annotations and keep only the created by My Annotator .

I'm working with the next libraries:

  1. uimaFIT 2.0.0
  2. ClearTK 1.4.1
  3. Maven

And I'm using a org.apache.uima.fit.pipeline.SimplePipeline with:

    UriCollectionReader.getCollectionReaderFromDirectory(filesDirectory), //directory with text files
    StanfordCoreNLPAnnotator.getDescription(),//stanford tokenize, ssplit, pos, lemma, ner, parse, dcoref
        XWriter.PARAM_OUTPUT_DIRECTORY_NAME, outputDirectory,
        XWriter.PARAM_FILE_NAMER_CLASS_NAME, ViewURIFileNamer.class.getName())

What I'm trying to do is to use the Standford NLP annotator(from ClearTK) and remove the useless annotation.

How do I do this?

From what I know, you can use the removeFromIndexes(); method from with an Annotation instance.

Do I need to create an UIMA processor and add it to my pipeline?

Finally I created an Engine to remove the useless annotation:

public class AnnotationRemover extends JCasAnnotator_ImplBase {
    public static AnalysisEngineDescription getDescription() throws ResourceInitializationException {
        return AnalysisEngineFactory.createEngineDescription(AnnotationRemover.class);

    public void initialize(UimaContext context) throws ResourceInitializationException {

    public void process(JCas jCas) throws AnalysisEngineProcessException {
        List<TOP> tops = new ArrayList<TOP>(JCasUtil.selectAll(jCas));
        for (TOP t : tops) {
            if (!t.getType().getName().equals("mypackage.MyAnnotation")) 

I'm removing all the annotations leaving only the mypackage.MyAnnotation annotations


I rewrote German Attanasios solution using java 8 and changed it to filter out anything with a different annotationTypePrefix:

public void filterAnnotations(JCas jcas, String annotationTypePrefix) {

            .filter(t -> !t.getType().getName().startsWith(annotationTypePrefix))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM