[英]UIMA Ruta: Creating new annotations by combining existing annotation's features in plain Java
I'm trying to convert the following logic into a UIMA Ruta Rule:我正在尝试将以下逻辑转换为 UIMA Ruta 规则:
Sentence {->NewAnnotation}
IF Sentence.part1
contains Constituent.label="VB"
AND Sentence.part2
contains Constituent.label="VBZ"
Sentence {->NewAnnotation}
IF Sentence.part1
包含Constituent.label="VB"
AND Sentence.part2
包含Constituent.label="VBZ"
In other words, I need to create a new annotation out of the entire Sentence and whose feature part1 (and part2 ) contains combinations/a sequence of specific posTags (Constituent.label).换句话说,我需要从整个 Sentence 中创建一个新注释,其特征part1 (和part2 )包含组合/一系列特定的 posTag(Constituent.label)。
At first, an intuitive answer for me was to use the CONTAINS
condition along with a STRINGLIST
(and config parameters) in the following manner:起初,对我来说一个直观的答案是以下列方式使用CONTAINS
条件和STRINGLIST
(和配置参数):
STRINGLIST posList; //assuming it is declared
Sentence{-> NewAnnotation} <-{Sentence.part1{CONTAINS(posList, Constituent.label)};};
But it doesn't produce any annotations(yet it doesn't fail).但它不会产生任何注释(但它不会失败)。
Then I considered the GETFEATURE
action by storing the Sentence
feature( Sentence.part1
) in a string variable and using it separately(in the main rule).然后我考虑了GETFEATURE
操作,方法是将Sentence
特征( Sentence.part1
)存储在一个字符串变量中并单独使用它(在主要规则中)。 However, since GETFEATURE
saves the feature in a STRING
format so I cannot use it to produce annotations (since I need ANNOTATION
type).但是,由于GETFEATURE
以STRING
格式保存该功能,因此我无法使用它来生成注释(因为我需要ANNOTATION
类型)。 Same happens with MATCHEDTEXT
action. MATCHEDTEXT
操作MATCHEDTEXT
发生同样的情况。
I understand the rule a want to build is quite complex but I believe Ruta is the most suitable option for such tasks.我知道想要构建的规则非常复杂,但我相信 Ruta 是此类任务的最合适选择。 So, can you please suggest me any ideas of how to deal with my problem?那么,您能否就如何处理我的问题提出任何建议?
As @PeterKluegl already stated, the solution to the original question would be:正如@PeterKluegl 已经说过的,原始问题的解决方案是:
Sentence{-> NewAnnotation} <-{Sentence.part1<-{Constituent.label=="VB";} %
Sentence.part2<-{Constituent.label=="VB";};};
Mind that this rule would work only if the Sentence
features (ie part1
) are annotations and not strings as it is in my case.请注意,只有当Sentence
特征(即part1
)是注释而不是字符串时,这条规则才有效,就像我的情况一样。
So, for potential interested people, I post also the solution approached in my case:因此,对于潜在感兴趣的人,我还发布了在我的案例中采用的解决方案:
Sentence
features in separate annotations but keeping the link between the Sentence.part1
and its parent Sentence
(this is possible in UIMA via parent pointers).将Sentence
特征存储在单独的注释中,但保持Sentence.part1
与其父Sentence
之间的链接(这在 UIMA 中可以通过父指针实现)。Apply the following rule:应用以下规则:
String rutaRule = "STRING id;" + "STRING part1Id;" + "STRING part2Id;" + "Sentence{->GETFEATURE(\\"matchId\\", id)};" + "part1{->GETFEATURE(\\"parent\\", part1Id)};" + "part2{->GETFEATURE(\\"parent\\", part2Id)};" + "Sentence{AND(IF(id == part1Id), IF(id == part2Id))-> NewAnnotation} <-" + "{part1<-{Constituent.label == \\"VBD\\";} % " + "part2<-{Constituent.label == \\"MD\\" # Constituent.label == \\"VBN\\";};};"; Ruta.apply(cas,rutaRule);
Hope this can be of any help.希望这能有所帮助。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.