简体   繁体   English

使用UIMA Ruta在文本文件中搜索项目

[英]Search for an item in a text file using UIMA Ruta

I have been trying to search for an item which is there in a text file. 我一直在尝试搜索文本文件中存在的项目。

The text file is like Eg: ` 文本文件就像例如:

>HEADING > HEADING

00345 00345

XYZ XYZ

MethodName : fdsafk 方法名称:fdsafk

Date: 23-4-2012 日期:2012年4月23日

More text and some part containing instances of XYZ` 更多文本和某些包含XYZ实例的部分

So I did a dictionary search for XYZ initially and found the positions, but I want only the 1st XYZ and not the rest. 因此,我最初在字典中搜索XYZ并找到了位置,但是我只想要第一个XYZ ,而不想要其余的。 There is a property of XYZ that , it will always be between the 5 digit code and the text MethondName . XYZ有一个属性,该属性将始终在5位代码和文本MethondName之间

I am unable to do that. 我做不到。

WORDLIST ZipList = 'Zipcode.txt';
DECLARE Zip;
Document
Document{-> MARKFAST(Zip, ZipList)};

DECLARE Method;
"MethodName" -> Method;


WORDLIST typelist = 'typelist.txt';
DECLARE type;
Document{-> MARKFAST(type, typelist)};

Also how do we use REGEX in UIMA RUTA? 另外,我们如何在UIMA RUTA中使用REGEX?

There are many ways to specify this. 有很多指定方法。 Here are some examples (not tested): 以下是一些示例(未经测试):

// just remove the other annotations (assuming type is the one you want)
type{-> UNMARK(type)} ANY{-STARTSWITH(Method)};

// only keep the first one: remove any annotation if there is one somewhere in front of it
// you can also specify this with POSISTION or CURRENTCOUNT, but both are slow
type # @type{-> UNMARK(type)}

// just create a new annotation in between
NUM{REGEXP(".....")} #{-> type} @Method;

There are two options to use regex in UIMA Ruta: 在UIMA Ruta中有两种使用正则表达式的选项:

  • (find) simple regex rules like "[A-Za-z]+" -> Type; (找到)简单的正则表达式规则,例如"[A-Za-z]+" -> Type;
  • (matches) REGEXP conditions for validating the match of a rule element like (匹配项)REGEXP条件,用于验证规则元素(如
    ANY{REGEXP("[A-Za-z]+")-> Type};

Let me know if something is not clear. 让我知道是否不清楚。 I will extend the description then. 然后,我将扩展描述。

DISCLAIMER: I am a developer of UIMA Ruta 免责声明:我是UIMA Ruta的开发人员

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM