简体   繁体   English

Lucene语义索引

[英]Lucene Indexing with Semantics

I'm using Lucene's Term Freq vector to calculate cosine similarity between documents, Say my docments has these 3 terms, "owe" "owed" "owing". 我正在使用Lucene的Term Freq向量来计算文档之间的余弦相似度,说我的文档具有这3个术语:“欠”,“欠”,“欠”。 Lucene takes this as 3 separate terms, but 3 of them means same "owe". Lucene将其视为3个单独的术语,但其中3个意味着相同的“欠”。 Is there any functionality in Lucene that can be used to index by semantics? Lucene中是否有任何可用于语义索引的功能? so that it indexes "owe" "owed" "owing" as one word "owe" with term frequency =3 ? 以便将“欠”,“欠”,“欠”作为一个词“欠”索引,其词频为3?

If not I'd welcome any suggestions achieving this task? 如果我不欢迎任何建议完成此任务?

You can use the SnowballFilter with an EnglishStemmer. 您可以将SnowballFilter与EnglishStemmer一起使用。 It will replace those verbs with the root verb word (in your example it would be owe , or maybe ow ). 它将用根动词替换这些动词(在您的示例中为oweow )。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM