簡體 English 中英

根據lucene索引中的排序順序存儲文檔

[英]store documents based on sort order in lucene index

原文 2011-08-09 08:22:17 5 3 java/ .net/ lucene

我的索引中有兩個字段（name，modifiedDate）。 我想基於modifiedDate存儲新文檔，並保持索引在modifiedDate上排序
doc＃1是最早的文檔，（modifiedDate）也是最早的文檔
doc #n是最近的文檔，而（modifiedDate）現在已經接近了

1）我如何創建這個文檔的物理存儲基於（modifiedDate）的索引結構，並在索引發生任何更改后保持結構（優化，刪除，更新）

2）以下結構讓我搜索特定日期范圍內的文檔。 但我不想搜索整個索引然后過濾。 如果超出日期范圍，我想使用以下結構跳過所有其他文檔

目前的lucene行為

for（1 to docCount）
if（modifiedDate在日期范圍過濾器中）
根據查詢計算得分

接受的行為

for（1 to docCount）
if（modifiedDate大於日期范圍的上限）
打破
其他
根據查詢計算得分

如果我有3,000,000個文檔而且我的日期范圍只滿足20個頂級文檔，在當前的lucene行為中我需要檢查所有文檔，但在接受的行為中我只得到前20個文檔，你可以猜到巨大的性能提升

3 個解決方案

現有的答案很好，但Lucene 4.3.0今年推出了一個新的“SortingMergePolicy”，允許高級Lucene用戶使用原始海報中建議的算法提前取消搜索。 請參閱javadocs

Lucene將在數字字段上有效地索引和查詢，請參閱NumericRangeQuery 。 我上面鏈接的javadoc有關於TrieRangeQuery實現的注釋。

您可以將modifiedDate存儲為NumericField，其中包含修改日期，以ms為單位。 然后在NumericRangeFilter周圍使用QueryWrapperFilter將搜索范圍限制在適當的日期范圍內。

這應該非常有效。

您可以使用modifiedDate對結果進行排序，請參閱以下答案：如何使用HitCollector按字段值對Lucene結果進行排序？
如果你真的很冒險，你可以做一些得分定制。 http://lucene.apache.org/java/3_3_0/scoring.html

HTH

Lucene是否將實際文檔存儲在其索引中？

[英]Does Lucene store the actual documents in its index?

查找lucene索引中的文檔數

[英]Finding the number of documents in a lucene index

如何在Lucene文檔中存儲數值？

[英]How to STORE numeric values in lucene documents?

如何獲取lucene索引的所有文件？

[英]How to get all documents of lucene index?

定期使用Lucene將新文檔刷新到索引

[英]periodically flushing new documents to an index using lucene

如何在不刪除文檔的情況下保持Lucene索引

[英]How to keep Lucene index without deleted documents

如何使用Lucene索引包含嵌套屬性的文檔？

[英]How to index documents containing nested properties with Lucene?

Java：根據另一個數組的索引順序對一個數組進行排序

[英]Java: Sort an Array Based on the Index Order of Another Array

lucene為什么不返回索引中的所有文檔？

[英]why lucene doesn't return all the documents in the index?

更新Lucene索引中的文檔時如何避免OutOfMemoryErrors？

[英]How can I avoid OutOfMemoryErrors when updating documents in a Lucene index?

暫無

暫無

聲明:本站的技術帖子網頁，遵循CC BY-SA 4.0協議，如果您需要轉載，請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

相關問題 Lucene是否將實際文檔存儲在其索引中？查找lucene索引中的文檔數如何在Lucene文檔中存儲數值？如何獲取lucene索引的所有文件？定期使用Lucene將新文檔刷新到索引如何在不刪除文檔的情況下保持Lucene索引如何使用Lucene索引包含嵌套屬性的文檔？ Java：根據另一個數組的索引順序對一個數組進行排序 lucene為什么不返回索引中的所有文檔？更新Lucene索引中的文檔時如何避免OutOfMemoryErrors？

相關標簽

粵ICP備18138465號 © 2020-2024 STACKOOM.COM