[英]How to remove UIMA annotations?
I'm using some UIMA annotators in a pipeline. 我在管道中使用了一些UIMA注释器。 It run tasks like: 它运行如下任务:
The problem is that I don't want to write ALL the annotations (Token, Sentence, SubToken, Time, myAnnotations, etc..) to the disk because the files gets very large quicky. 问题是我不想将所有注释(Token,Sentence,SubToken,Time,myAnnotations等)写入磁盘,因为文件变得非常大。
I want to remove all the annotations and keep only the created by My Annotator . 我想删除所有注释,只保留由My Annotator创建的注释。
I'm working with the next libraries: 我正在使用下一个库:
And I'm using a org.apache.uima.fit.pipeline.SimplePipeline
with: 我正在使用org.apache.uima.fit.pipeline.SimplePipeline
:
SimplePipeline.runPipeline(
UriCollectionReader.getCollectionReaderFromDirectory(filesDirectory), //directory with text files
UriToDocumentTextAnnotator.getDescription(),
StanfordCoreNLPAnnotator.getDescription(),//stanford tokenize, ssplit, pos, lemma, ner, parse, dcoref
AnalysisEngineFactory.createEngineDescription(//
XWriter.class,
XWriter.PARAM_OUTPUT_DIRECTORY_NAME, outputDirectory,
XWriter.PARAM_FILE_NAMER_CLASS_NAME, ViewURIFileNamer.class.getName())
);
What I'm trying to do is to use the Standford NLP annotator(from ClearTK) and remove the useless annotation. 我想要做的是使用Standford NLP注释器(来自ClearTK)并删除无用的注释。
How do I do this? 我该怎么做呢?
From what I know, you can use the removeFromIndexes();
据我所知,你可以使用removeFromIndexes();
method from with an Annotation instance. 来自Annotation实例的方法。
Do I need to create an UIMA processor and add it to my pipeline? 我是否需要创建UIMA处理器并将其添加到我的管道中?
Finally I created an Engine to remove the useless annotation: 最后我创建了一个引擎来删除无用的注释:
public class AnnotationRemover extends JCasAnnotator_ImplBase {
public static AnalysisEngineDescription getDescription() throws ResourceInitializationException {
return AnalysisEngineFactory.createEngineDescription(AnnotationRemover.class);
}
public void initialize(UimaContext context) throws ResourceInitializationException {
super.initialize(context);
}
public void process(JCas jCas) throws AnalysisEngineProcessException {
List<TOP> tops = new ArrayList<TOP>(JCasUtil.selectAll(jCas));
for (TOP t : tops) {
if (!t.getType().getName().equals("mypackage.MyAnnotation"))
t.removeFromIndexes();
}
}
}
I'm removing all the annotations leaving only the mypackage.MyAnnotation annotations 我正在删除所有注释,只留下mypackage.MyAnnotation注释
是的:在MyAnnotator和XWriter之间添加另一个注释器,删除所有注释但是你的注释。
I rewrote German Attanasios solution using java 8 and changed it to filter out anything with a different annotationTypePrefix: 我使用java 8重写了德语Attanasios解决方案,并将其更改为使用不同的annotationTypePrefix过滤掉任何内容:
public void filterAnnotations(JCas jcas, String annotationTypePrefix) {
JCasUtil.selectAll(jcas)
.stream()
.filter(t -> !t.getType().getName().startsWith(annotationTypePrefix))
.forEach(TOP::removeFromIndexes);
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.