UIMA 环境中的 Ruta。在纯 Java 中使用预定义的集合/集合和词典

Question

I'm a beginner with Ruta and the idea I'm trying to grasp now is how to handle, within UIMA environment(in plain Java), the class variables/collections.我是 Ruta 的初学者，我现在试图掌握的想法是如何在 UIMA 环境（纯 Java）中处理类变量/集合。 I've tried following the examples given in the documentation ;我尝试按照文档中给出的示例进行操作； but the Ruta rules are applied either externally as a script file or right "on the spot" using Ruta.apply(cas, rule).但是 Ruta 规则既可以作为脚本文件从外部应用，也可以使用 Ruta.apply(cas, rule) 在“现场”应用。 Neither of these options allows me to use, for example, a file lexicon or any predifined java collections.这些选项都不允许我使用，例如，文件词典或任何预定义的 java 集合。 Could you please give me any hints/solutions to my problem?你能给我任何关于我的问题的提示/解决方案吗？

Generally, I'm using UIMA AE's to parse sentences and then, to use the created annotations within Ruta script for matching specific types of sentences based on their syntactical structure.通常，我使用 UIMA AE 来解析句子，然后使用 Ruta 脚本中创建的注释根据句法结构匹配特定类型的句子。 Therefore, the Ruta rules I write are fairly simple but bulky because of the POStags set.因此，我编写的 Ruta 规则相当简单，但由于 POStags 集而显得笨重。 So I would like to get some flexibility inside Ruta.所以我想在 Ruta 内部获得一些灵活性。 I would be grateful if there are any suggestions on this topis as well.如果对此主题有任何建议，我将不胜感激。

EDIT: For example, I have a rule which considers a set of POSTags created by an AE (Stanford Parser).编辑：例如，我有一个规则，它考虑了由 AE (Stanford Parser) 创建的一组 POSTags。 So in order to match the desired sentence structure I would hardcode it in the following way(I realize it's the most naive way):因此，为了匹配所需的句子结构，我将按以下方式对其进行硬编码（我意识到这是最天真的方式）：

String rutaSampleRule = "BLOCK(ForEach) Sentence{}{Document{-> Asyndeton} " + "<- {((Constituent.label==\\"NN\\" COMMA Constituent.label==\\"NN\\") |" + " (Constituent.label==\\"NNP\\" COMMA Constituent.label==\\"NNP\\") |" + " (Constituent.label==\\"NNPS\\" COMMA Constituent.label==\\"NNPS\\") |" + " (Constituent.label==\\"NNS\\" COMMA Constituent.label==\\"NNS\\"));};}"; Ruta.apply(cas, rutaSampleRule);

Now, what I would like to have instead is to declare a collection of such POStags (ie NNS, NN), iterate over it inside Ruta and match the respective sentence structure (here, consecutive nouns).现在，我想要的是声明一个此类 POSTags（即 NNS、NN）的集合，在 Ruta 内部对其进行迭代并匹配相应的句子结构（这里是连续名词）。 This would make my rules much more flexible and practical.这将使我的规则更加灵活和实用。

The second option would be to use lexicons instead of collection but I thought they can be used(with MARKFAST) only within Ruta separately(not plain Java);第二种选择是使用词典而不是集合，但我认为它们只能在 Ruta 中单独使用（使用 MARKFAST）（不是纯 Java）； at least I could not find any examples.至少我找不到任何例子。

So, to summarize my question: Is it possible(and how if so), within simple Ruta scripts (which do not introduce any new types), to work with externally defined collections/lexicons in plain Java?所以，总结一下我的问题：在简单的 Ruta 脚本（不引入任何新类型）中，是否有可能（以及如何）在纯 Java 中使用外部定义的集合/词典？

I hope, I managed to explain it in a better way.我希望，我设法以更好的方式解释它。 Thanks in advance.提前致谢。

EDIT 1: I figured it out how to use lexicons inside plain Java just by playing around with paths and the example in the guide book.编辑 1：我想出了如何在普通 Java 中使用词典，只是通过玩弄路径和指南中的示例。 Still, I would like to know how to assign the values to variables by using the configuration parameters?不过，我想知道如何使用配置参数将值分配给变量？

Answer 1

This should do the trick (tested with current trunk):这应该可以解决问题（使用当前主干测试）：

String rutaSampleRule = "STRINGLIST posList;"
    + "Sentence{-> Asyndeton} <- {"
    + "c1:Constituent{CONTAINS(posList, c1.label)} COMMA c2:Constituent{c2.label == c1.label};"
    + "};";

List<String> posList = Arrays.asList(new String[] { "NN", "NNP", "NNPS", "NNS" });
Map<String, Object> additionalParams = new HashMap<>();
additionalParams.put(RutaEngine.PARAM_VAR_NAMES, new String[] { "posList" });
additionalParams.put(RutaEngine.PARAM_VAR_VALUES, new String[] { StringUtils.join(posList, ",") });
Ruta.apply(cas, rutaSampleRule, additionalParams);

Some comments:一些评论：

A STRINGLIST is declared in the rules and filled by using the two config parameters. STRINGLIST 在规则中声明并使用两个配置参数填充。
I refactored the inlined rules: no disjunctive composed rule element required (several rules would do the same), no multiple rule elements/rules required.我重构了内联规则：不需要分离的组合规则元素（几个规则会做同样的事情），不需要多个规则元素/规则。
A block is now not required anymore in the example.示例中现在不再需要块。 I removed it.我删除了它。
If there is some problem with released version of Ruta, rewriting of the rule is required: usage of a string variable instead of direct comparison of features of the label expressions.如果 Ruta 的发布版本存在问题，则需要重写规则：使用字符串变量而不是直接比较标签表达式的特征。
An approach using an external dictionary would like quite similar, eg, with an INLIST condition.使用外部字典的方法非常相似，例如，使用 INLIST 条件。

DISCLAIMER: I am a developer of UIMA Ruta免责声明：我是 UIMA Ruta 的开发人员

UIMA 环境中的 Ruta。在纯 Java 中使用预定义的集合/集合和词典

问题描述

1 个解决方案

解决方案1
1 已采纳 2017-01-30 13:08:30

UIMA 环境中的 Ruta。 在纯 Java 中使用预定义的集合/集合和词典

问题描述

1 个解决方案

解决方案1 1 已采纳 2017-01-30 13:08:30

UIMA 环境中的 Ruta。在纯 Java 中使用预定义的集合/集合和词典

解决方案1
1 已采纳 2017-01-30 13:08:30