简体   繁体   English

在将TextField(field,TokenStream)添加到文档中,Lucene 7与Lucene 6不一致

[英]Lucene 7 is not consistent with Lucene 6 regarding adding TextField(field, TokenStream) to a document

I'm adding a field to a document using TextField with TokenStream, using the following code: 我使用以下代码将TextField和TokenStream一起添加到文档中的字段:

TokenFilter contentFilter = createContentFilter(body);
Field contentField = new TextField("fieldName", contentFilter);
doc.add(contentField);

It used to work great with Lucene 6. However, moving to Lucene 7 I get an error message while adding the document to the index: 过去,它在Lucene 6上运行良好。但是,在转到Lucene 7时,将文档添加到索引时出现错误消息:

java.lang.IllegalArgumentException: startOffset must be non-negative, and endOffset must be >= startOffset, and offsets must not go backwards startOffset=0,endOffset=39,lastStartOffset=36 for field 'content'
at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:767)
at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:430)

What ha been changed in Lucene 7 that breaks my code? Lucene 7中发生了什么更改,从而破坏了我的代码? Thans, David 丹·大卫

Well, the answer is that in Lucene 7, token offsets should follow new regulations that have not been demanded before. 好吧,答案是在Lucene 7中,令牌偏移量应遵循以前未要求的新规定。 The start offset of a token must be larger or equal than the end offset of its previous tokens. 令牌的开始偏移必须大于或等于其先前令牌的结束偏移。 This disallow using the token's offset attribute for representing a span within the text. 这不允许使用标记的offset属性表示文本内的跨度。 This is sad, very sad :) 这可悲,非常可悲:)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM