如何删除UIMA注释？

Question

I'm using some UIMA annotators in a pipeline. 我在管道中使用了一些UIMA注释器。 It run tasks like: 它运行如下任务：

tokenizer 标记生成器
sentence splitter 句子分割器
gazetizer gazetizer
My Annotator 我的注释器

The problem is that I don't want to write ALL the annotations (Token, Sentence, SubToken, Time, myAnnotations, etc..) to the disk because the files gets very large quicky. 问题是我不想将所有注释（Token，Sentence，SubToken，Time，myAnnotations等）写入磁盘，因为文件变得非常大。

I want to remove all the annotations and keep only the created by My Annotator . 我想删除所有注释，只保留由My Annotator创建的注释。

I'm working with the next libraries: 我正在使用下一个库：

uimaFIT 2.0.0 uimaFIT 2.0.0
ClearTK 1.4.1 ClearTK 1.4.1
Maven Maven的

And I'm using a org.apache.uima.fit.pipeline.SimplePipeline with: 我正在使用org.apache.uima.fit.pipeline.SimplePipeline ：

SimplePipeline.runPipeline(
    UriCollectionReader.getCollectionReaderFromDirectory(filesDirectory), //directory with text files
    UriToDocumentTextAnnotator.getDescription(),
    StanfordCoreNLPAnnotator.getDescription(),//stanford tokenize, ssplit, pos, lemma, ner, parse, dcoref
    AnalysisEngineFactory.createEngineDescription(//
        XWriter.class, 
        XWriter.PARAM_OUTPUT_DIRECTORY_NAME, outputDirectory,
        XWriter.PARAM_FILE_NAMER_CLASS_NAME, ViewURIFileNamer.class.getName())
);

What I'm trying to do is to use the Standford NLP annotator(from ClearTK) and remove the useless annotation. 我想要做的是使用Standford NLP注释器（来自ClearTK）并删除无用的注释。

How do I do this? 我该怎么做呢？

From what I know, you can use the removeFromIndexes(); 据我所知，你可以使用removeFromIndexes(); method from with an Annotation instance. 来自Annotation实例的方法。

Do I need to create an UIMA processor and add it to my pipeline? 我是否需要创建UIMA处理器并将其添加到我的管道中？

Answer 1

Finally I created an Engine to remove the useless annotation: 最后我创建了一个引擎来删除无用的注释：

public class AnnotationRemover extends JCasAnnotator_ImplBase {
    public static AnalysisEngineDescription getDescription() throws ResourceInitializationException {
        return AnalysisEngineFactory.createEngineDescription(AnnotationRemover.class);
    }

    public void initialize(UimaContext context) throws ResourceInitializationException {
        super.initialize(context);
    }

    public void process(JCas jCas) throws AnalysisEngineProcessException {
        List<TOP> tops = new ArrayList<TOP>(JCasUtil.selectAll(jCas));
        for (TOP t : tops) {
            if (!t.getType().getName().equals("mypackage.MyAnnotation")) 
                t.removeFromIndexes();
            }
        }
}

I'm removing all the annotations leaving only the mypackage.MyAnnotation annotations 我正在删除所有注释，只留下mypackage.MyAnnotation注释

Answer 2

是的：在MyAnnotator和XWriter之间添加另一个注释器，删除所有注释但是你的注释。

Answer 3

I rewrote German Attanasios solution using java 8 and changed it to filter out anything with a different annotationTypePrefix: 我使用java 8重写了德语Attanasios解决方案，并将其更改为使用不同的annotationTypePrefix过滤掉任何内容：

public void filterAnnotations(JCas jcas, String annotationTypePrefix) {

    JCasUtil.selectAll(jcas)
            .stream()
            .filter(t -> !t.getType().getName().startsWith(annotationTypePrefix))
            .forEach(TOP::removeFromIndexes);
}

如何删除UIMA注释？

问题描述

3 个解决方案

解决方案1
7 已采纳 2014-01-01 23:11:10

解决方案2
2 2013-12-31 12:30:06

解决方案3
1 2018-12-06 14:40:43

如何删除UIMA注释？

问题描述

3 个解决方案

解决方案1 7 已采纳 2014-01-01 23:11:10

解决方案2 2 2013-12-31 12:30:06

解决方案3 1 2018-12-06 14:40:43

解决方案1
7 已采纳 2014-01-01 23:11:10

解决方案2
2 2013-12-31 12:30:06

解决方案3
1 2018-12-06 14:40:43