使用UIMA Ruta在文本文件中搜索项目

Question

I have been trying to search for an item which is there in a text file. 我一直在尝试搜索文本文件中存在的项目。

The text file is like Eg: ` 文本文件就像例如：

>HEADING > HEADING

00345 00345

XYZ XYZ

MethodName : fdsafk 方法名称：fdsafk

Date: 23-4-2012 日期：2012年4月23日

More text and some part containing instances of XYZ` 更多文本和某些包含XYZ实例的部分

So I did a dictionary search for XYZ initially and found the positions, but I want only the 1st XYZ and not the rest. 因此，我最初在字典中搜索XYZ并找到了位置，但是我只想要第一个XYZ ，而不想要其余的。 There is a property of XYZ that , it will always be between the 5 digit code and the text MethondName . XYZ有一个属性，该属性将始终在5位代码和文本MethondName之间 。

I am unable to do that. 我做不到。

WORDLIST ZipList = 'Zipcode.txt';
DECLARE Zip;
Document
Document{-> MARKFAST(Zip, ZipList)};

DECLARE Method;
"MethodName" -> Method;


WORDLIST typelist = 'typelist.txt';
DECLARE type;
Document{-> MARKFAST(type, typelist)};

Also how do we use REGEX in UIMA RUTA? 另外，我们如何在UIMA RUTA中使用REGEX？

Answer 1

There are many ways to specify this. 有很多指定方法。 Here are some examples (not tested): 以下是一些示例（未经测试）：

// just remove the other annotations (assuming type is the one you want)
type{-> UNMARK(type)} ANY{-STARTSWITH(Method)};

// only keep the first one: remove any annotation if there is one somewhere in front of it
// you can also specify this with POSISTION or CURRENTCOUNT, but both are slow
type # @type{-> UNMARK(type)}

// just create a new annotation in between
NUM{REGEXP(".....")} #{-> type} @Method;

There are two options to use regex in UIMA Ruta: 在UIMA Ruta中有两种使用正则表达式的选项：

(find) simple regex rules like "[A-Za-z]+" -> Type; （找到）简单的正则表达式规则，例如"[A-Za-z]+" -> Type;
(matches) REGEXP conditions for validating the match of a rule element like （匹配项）REGEXP条件，用于验证规则元素（如
ANY{REGEXP("[A-Za-z]+")-> Type};

Let me know if something is not clear. 让我知道是否不清楚。 I will extend the description then. 然后，我将扩展描述。

DISCLAIMER: I am a developer of UIMA Ruta 免责声明：我是UIMA Ruta的开发人员

使用UIMA Ruta在文本文件中搜索项目

问题描述

>HEADING > HEADING

00345 00345

XYZ XYZ

MethodName : fdsafk 方法名称：fdsafk

Date: 23-4-2012 日期：2012年4月23日

More text and some part containing instances of XYZ` 更多文本和某些包含XYZ实例的部分

1 个解决方案

解决方案1
1 2016-02-16 14:03:02

使用UIMA Ruta在文本文件中搜索项目

问题描述

>HEADING > HEADING

00345 00345

XYZ XYZ

MethodName : fdsafk 方法名称：fdsafk

Date: 23-4-2012 日期：2012年4月23日

More text and some part containing instances of XYZ` 更多文本和某些包含XYZ实例的部分

1 个解决方案

解决方案1 1 2016-02-16 14:03:02

解决方案1
1 2016-02-16 14:03:02