简体   繁体   English

在UIMA中访问注释

[英]Accessing annotations in UIMA

Is there a way in UIMA to access the annotations from the tokens like the same way they do in their CAS debugger GUI?. UIMA中是否有一种方法可以像在CAS调试器GUI中一样从令牌访问注释? You can of course access all the annotations from the index repository, but i want to loop on the tokens, and get all associated annotations to every token. 您当然可以从索引存储库访问所有注释,但是我想循环访问令牌,并为每个令牌获取所有关联的注释。

The reason for that is simply, I want to want to check some annotations and discard the others and in such way it is much easier. 这样做的原因很简单,我想检查一些注释并丢弃其他注释,这样便容易得多。 Any help is appreciated :) 任何帮助表示赞赏:)

I'm a uimaFIT developer. 我是uimaFIT开发人员。

If you want to find all annotations within the boundaries of another annotation, you may prefer the shorter and faster variant 如果要在另一个注解的边界内找到所有注解,则可以选择较短且较快的变体

JCasUtil.selectCovered(referenceAnnotation, <T extends ANNOTATION>);

Mind that it is not a good idea creating a "dummy" annotation with the desired offsets and then search within its boundaries, because this immediately allocates memory in the CAS which and is not garbage-collected unless the complete CAS is collected. 请注意,创建具有所需偏移量的“虚拟”注释并在其边界内进行搜索不是一个好主意,因为这会立即在CAS中分配内存,除非收集了完整的CAS,否则不会进行垃圾收集。

After searching and asking the developers of cTAKES( Apache clinical Text Analysis and Knowledge Extraction System ). 在搜索并询问了cTAKES(Apache临床文本分析和知识提取系统)的开发人员之后。 you can use the following library "uimafit" which can be found on http://code.google.com/p/uimafit/ . 您可以使用以下库“ uimafit”,该库可在http://code.google.com/p/uimafit/上找到。 The following code can be used 可以使用以下代码

List list = JCasUtil.selectCovered(jcas, <T extends Annotation>, startIndex, endIndex);

This will return all the between the 2 indices. 这将返回两个索引之间的所有值。

Hope that will help 希望能有所帮助

if you don't want to use uimaFIT, you can create a filtered iterator to loop through annotations of interest. 如果您不想使用uimaFIT,则可以创建一个过滤的迭代器来遍历感兴趣的注释。 The UIMA reference documentation is here: UIMA reference documentation UIMA参考文档在这里: UIMA参考文档

I recently used this approach in some code to find a sentence annotation which encompassed a regex annotation (this approach was acceptable for our project because all regular expression matches were shorter than the sentences in the document, and there was only one regex match per sentence. Obviously, based on indexing rules, your mileage may vary. If you are afraid of running into another shorterAnnotationType , put the inner code into a while loop): 我最近在某些代码中使用了这种方法来查找包含正则表达式注释的句子注释(此方法对于我们的项目是可以接受的,因为所有正则表达式匹配项都比文档中的句子短,并且每个句子只有一个正则表达式匹配项。显然,根据索引shorterAnnotationType规则,您的shorterAnnotationType可能会有所不同。如果您担心碰到另一个shorterAnnotationType ,请将内部代码放入while循环中):

static ArrayList<annotationsPair> process(Annotation shorterAnnotationType, 
        Annotation longerAnnotationType, JCas aJCas){

    ArrayList<annotationsPair> annotationsList = new ArrayList<annotationsPair>();

    FSIterator it = aJCas.getAnnotationIndex().iterator();
    FSTypeConstraint constraint = aJCas.getConstraintFactory().createTypeConstraint();
    constraint.add(shorterAnnotationType.getType());
    constraint.add(longerAnnotationType.getType());
    it = aJCas.createFilteredIterator(it, constraint);

    Annotation a = null;
    int shorterBegin = -1;
    int shorterEnd = -1;
    it.moveTo((shorterAnnotationType));
    while (it.isValid()) {
        a = (Annotation) it.get();
        if (a.getClass() == shorterAnnotationType.getClass()){
            shorterBegin = a.getBegin();
            shorterEnd = a.getEnd();
            System.out.println("Target annotation from " + shorterBegin 
                    + " to " + shorterEnd);
            //because assume that sentence type is longer than other type, 
            //the sentence gets indexed prior
            it.moveToPrevious(); 
            if(it.isValid()){
                Annotation prevAnnotation = (Annotation) it.get();
                if (prevAnnotation.getClass() == longerAnnotationType.getClass()){
                    int sentBegin = prevAnnotation.getBegin();
                    int sentEnd = prevAnnotation.getEnd();
                    System.out.println("found annotation [" + prevAnnotation.getCoveredText()
                            + "] location: " + sentBegin + ", " + sentEnd);
                    annotationsPair pair = new annotationsPair(a, prevAnnotation);
                    annotationsList.add(pair);
                }
                //return to where you started
                it.moveToNext(); //will not invalidate iter because just came from next
            }
        }
        it.moveToNext();
    }

    return annotationsList;

}

Hope this helps! 希望这可以帮助! Disclaimer: I am new to UIMA. 免责声明:我是UIMA的新手。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM