简体   繁体   中英

Lucene 7 is not consistent with Lucene 6 regarding adding TextField(field, TokenStream) to a document

I'm adding a field to a document using TextField with TokenStream, using the following code:

TokenFilter contentFilter = createContentFilter(body);
Field contentField = new TextField("fieldName", contentFilter);
doc.add(contentField);

It used to work great with Lucene 6. However, moving to Lucene 7 I get an error message while adding the document to the index:

java.lang.IllegalArgumentException: startOffset must be non-negative, and endOffset must be >= startOffset, and offsets must not go backwards startOffset=0,endOffset=39,lastStartOffset=36 for field 'content'
at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:767)
at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:430)

What ha been changed in Lucene 7 that breaks my code? Thans, David

Well, the answer is that in Lucene 7, token offsets should follow new regulations that have not been demanded before. The start offset of a token must be larger or equal than the end offset of its previous tokens. This disallow using the token's offset attribute for representing a span within the text. This is sad, very sad :)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM