简体   繁体   中英

Apache Lucene TokenStream Filters

I have some questions regarding Apache Lucene Library:

1) How can I concatenate two TokenStream objects into one TokenStream object ?

2) Which filter can be used to remove all duplicate tokens (with same value) from a TokenStream object ?

Thanks in Advance

As far as concatenating from two sources, just add two Field instances with the same name to the Document . This is guaranteed to be the same as a single field with the value concatenated.

As far as eliminating duplicated terms, this is not really necessary. Lucene will only count the term frequency for a document in order to score them higher. If you don't need that, you can define your own Similarity instance that implements tf as a constant of 1.

Or, if you need to disable term frequency per field only, you can instantiate the Field with Field.TermVector.NO .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM