简体   繁体   中英

How do I run UIMA on a list of files?

I am using org.apache.uima.examples.cpe.SimpleRunCPE in my Java program, which takes as input a CpeDescription XML file.

This file has a nameValuePair for InputDirectory, which points to where the text files are that UIMA should work on.

How would I run a CPE on a specified list of files instead?

The background is that I have a very large number of textfiles in a directory to run UIMA on to generate the CAS files. If after days of running the UIMA process suddenly has to stop (crash because of an out of Heap Memory or computer has to be rebooted), I would like to run the process on the remaining unprocessed files only.

How would I go on about that?

In your CpeDescription XML file, you will need to modify your CollectionReader to accept a new parameter (eg a list of files, or a regex) to filter out the files that have already been processed. Here some code and tests to get you started.

Another workaround is to structure you large InputDirectory into smaller subdirectories and run a UIMA CPE on each of them. This way, if one CPE batch fails, you can just restart it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM