简体繁体 English

Apache Lucene TokenStream过滤器

[英]Apache Lucene TokenStream Filters

原文 2012-08-23 19:41:48 6 1 java/ lucene/ machine-learning

I have some questions regarding Apache Lucene Library: 我对Apache Lucene库有一些疑问：

1) How can I concatenate two TokenStream objects into one TokenStream object ? 1）如何将两个TokenStream对象连接成一个TokenStream对象？

2) Which filter can be used to remove all duplicate tokens (with same value) from a TokenStream object ? 2）哪些过滤器可以用来除去来自物体的TokenStream所有重复标记（具有相同的值）？

Thanks in Advance 提前致谢

1 个解决方案

As far as concatenating from two sources, just add two Field instances with the same name to the Document . 至于从两个源进行连接，只需将两个具有相同名称的Field实例添加到Document 。 This is guaranteed to be the same as a single field with the value concatenated. 这保证是相同的，与级联的值的单个字段。

As far as eliminating duplicated terms, this is not really necessary. 至于消除重复的术语，这实际上不是必需的。 Lucene will only count the term frequency for a document in order to score them higher. Lucene只会计算文档的术语频率以使其得分更高。 If you don't need that, you can define your own Similarity instance that implements tf as a constant of 1. 如果不需要，可以定义自己的Similarity实例，该实例将tf实现为常数1。

Or, if you need to disable term frequency per field only, you can instantiate the Field with Field.TermVector.NO . 或者，如果仅需要禁用每个字段的词频，则可以使用Field.TermVector.NO实例化Field 。

Apache Lucene TokenStream 合同违规 - Apache Lucene TokenStream contract violation

将令牌添加到Lucene令牌流 - Adding tokens to a lucene tokenstream

Lucene Highlighter TokenStream异常 - Lucene Highlighter TokenStream exception

Lucene自定义TokenStream - Lucene Customize TokenStream

Lucene TokenStream异常 - Lucene TokenStream Exception

java.lang.VerifyError：类org.apache.lucene.analysis.ReusableAnalyzerBase覆盖最终方法tokenStream - java.lang.VerifyError: class org.apache.lucene.analysis.ReusableAnalyzerBase overrides final method tokenStream

Apache Lucene：建立索引时如何使用TokenStream手动接受或拒绝令牌 - Apache Lucene: How to use TokenStream to manually accept or reject a token when indexing

Lucene 4.0覆盖最终方法tokenStream - Lucene 4.0 overrides final method tokenStream

Java | Lucene | TokenStream字段无法存储 - Java | Lucene | TokenStream fields cannot be stored

如何从Lucene中的TokenStream中删除数字？ - How to remove numbers from TokenStream in Lucene?

暂无

暂无

声明:本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Apache Lucene TokenStream 合同违规 - Apache Lucene TokenStream contract violation 将令牌添加到Lucene令牌流 - Adding tokens to a lucene tokenstream Lucene Highlighter TokenStream异常 - Lucene Highlighter TokenStream exception Lucene自定义TokenStream - Lucene Customize TokenStream Lucene TokenStream异常 - Lucene TokenStream Exception java.lang.VerifyError：类org.apache.lucene.analysis.ReusableAnalyzerBase覆盖最终方法tokenStream - java.lang.VerifyError: class org.apache.lucene.analysis.ReusableAnalyzerBase overrides final method tokenStream Apache Lucene：建立索引时如何使用TokenStream手动接受或拒绝令牌 - Apache Lucene: How to use TokenStream to manually accept or reject a token when indexing Lucene 4.0覆盖最终方法tokenStream - Lucene 4.0 overrides final method tokenStream Java | Lucene | TokenStream字段无法存储 - Java | Lucene | TokenStream fields cannot be stored 如何从Lucene中的TokenStream中删除数字？ - How to remove numbers from TokenStream in Lucene?

相关标签

粤ICP备18138465号 © 2020-2024 STACKOOM.COM