I have some questions regarding Apache Lucene Library:
1) How can I concatenate two TokenStream objects into one TokenStream object ?
2) Which filter can be used to remove all duplicate tokens (with same value) from a TokenStream object ?
Thanks in Advance
As far as concatenating from two sources, just add two Field
instances with the same name to the Document
. This is guaranteed to be the same as a single field with the value concatenated.
As far as eliminating duplicated terms, this is not really necessary. Lucene will only count the term frequency for a document in order to score them higher. If you don't need that, you can define your own Similarity
instance that implements tf
as a constant of 1.
Or, if you need to disable term frequency per field only, you can instantiate the Field
with Field.TermVector.NO
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.