[英]How do you configure Lucene in Sitecore to only index the latest version of an item on the master db?
I recognise this is a moot point on the web database, so this question applies to the master db... 我认识到这是网络数据库上的一个有争议的问题,所以这个问题适用于主数据库......
I have a custom index set up in Sitecore 6.4.1 as follows: 我在Sitecore 6.4.1中设置了自定义索引,如下所示:
<index id="search_content_US" type="Sitecore.Search.Index, Sitecore.Kernel">
<param desc="name">$(id)</param>
<param desc="folder">_search_content_US</param>
<Analyzer ref="search/analyzer" />
<locations hint="list:AddCrawler">
<search_content_home type="Sitecore.Search.Crawlers.DatabaseCrawler, Sitecore.Kernel">
<Database>master</Database>
<Root>/sitecore/content/usa home</Root>
<Tags>home content</Tags>
</search_content_home>
</locations>
</index>
I query the index like this (I am using techphoria414's SortableIndexSearchContext
from this answer: How to sort/filter using the new Sitecore.Search API ): 我像这样查询索引(我从这个答案中使用了techphoria414的
SortableIndexSearchContext
: 如何使用新的Sitecore.Search API进行排序/过滤 ):
private SearchHits GetSearchResults(SortableIndexSearchContext searchContext, string searchTerm)
{
CombinedQuery query = new CombinedQuery();
query.Add(new FullTextQuery(searchTerm), QueryOccurance.Must);
return searchContext.Search(query, Sort.RELEVANCE);
}
...
SearchHits hits = GetSearchResults(searchContext, searchTerm);
hits
is a collection of search hits from my index. hits
是我索引中搜索命中的集合。 When I iterate through hits
I can see that there are many duplicates of the same items in Sitecore, 1 per version of the item. 当我遍历
hits
我可以看到Sitecore中有相同项目的许多重复项,每个版本的项目有1个。
I then do the following to get a SearchResultCollection
: 然后,我执行以下操作以获取
SearchResultCollection
:
SearchResultCollection results = hits.FetchResults(0, hits.Length);
This combines all of the duplicates into a single SearchResult
object. 这将所有重复项组合到一个
SearchResult
对象中。 This object represents 1 version of a particular item, and has a property called SubResults
which is a collection of SearchResult
s that represent all of the other item versions. 此对象表示特定项目的1个版本,并且具有名为
SubResults
的属性,该属性是SearchResult
的集合,代表所有其他项目版本。
Here's my problem: 这是我的问题:
The version of the item represented by the SearchResult
is NOT the current published version of the item! SearchResult
表示的项目版本不是该项目的当前发布版本! It appears to be a randomly selected version (whichever the search method hit first in the index). 它似乎是一个随机选择的版本(无论搜索方法在索引中首先命中)。 The latest version is included in the
SubResults
collection, however. 最新版本包含在
SubResults
集,但是。
Eg: 例如:
SearchResult
|
|- Version 8 // main result
...
|- SubResults
|
|- Version 9 // latest version
|- Version 3
|- Version 5
... // all versions in random order
How do I prevent this from happening on the master db? 如何防止在主数据库上发生这种情况? Either by preventing Lucene from indexing old versions of items, or by doing some manipulation of the result set to get the latest version from the
SubResults
? 要么阻止Lucene索引旧版本的项目,要么通过对结果集进行一些操作来从
SubResults
获取最新版本?
As an aside, why does Lucene bother to index old versions of items anyway? 顺便说一句,为什么Lucene还要为旧版本的商品编制索引呢? Surely this is pointless for searching content on your website as the old versions are not visible?
当然,这对于在您的网站上搜索内容毫无意义,因为旧版本不可见?
You can implement a custom crawler that overrides the following: 您可以实现覆盖以下内容的自定义搜寻器:
public class IndexCrawler : DatabaseCrawler
{
protected override void IndexVersion(Item item, Item latestVersion, Sitecore.Search.IndexUpdateContext context)
{
if (item.Versions.Count > 0 && item.Version.Number != latestVersion.Version.Number)
return;
base.IndexVersion(item, latestVersion, context);
}
}
This ensures that only the latest version of an item gets into your Index, and therefore will be the only item pull out of said index 这样可以确保只有最新版本的项目才会进入您的索引,因此它将成为拉出所述索引的唯一项目
You would need to update your configuration file to set the correct type for the index of course 您需要更新配置文件以设置索引的正确类型
在Sitecore 7中 ,字段_latestversion被添加到索引中,包含最新版本的“1”(其他版本具有空值)。
如果您让Lucene在您的Web数据库而不是Master中进行搜索,则它应该仅对最后发布的版本编制索引。
<Database>web</Database>
Although the solution provided by theyetiman, by using an adjusted sort mechanism, is an interesting approach, it does not provide a perfect solution when the Lucene result scores for the two versions tend to differ. 尽管由他们提供的解决方案,通过使用调整后的排序机制,是一种有趣的方法,但当两个版本的Lucene结果得分趋于不同时,它不能提供完美的解决方案。 Eg out of v1 with score 0.7, and v2 with score 0.5, his solution will still return the first version of the item.
例如,在得分为0.7的v1和得分为0.5的v2之外,他的解决方案仍将返回该项目的第一个版本。 (At least in my tests.)
(至少在我的测试中。)
After some more digging, the most obvious solution apparently lies in implementing your own Sitecore.Pipelines.Search.SearchSystemIndex
and using that one instead of the default. 经过一番挖掘,最明显的解决方案显然是实现自己的
Sitecore.Pipelines.Search.SearchSystemIndex
并使用那个而不是默认的。 If you decompile that code using ILSpy or similar, you will notice the following at the bottom of the Process
method: 如果使用ILSpy或类似程序对该代码进行反编译,您将在
Process
方法的底部注意到以下内容:
foreach (SearchResult current in searchHits.FetchResults(0, searchHits.Length)){
// ...
}
Each such SearchResult
is actually group-by, where the first result that was returned from Lucene (thus the one with the highest score) is the main result. 每个这样的
SearchResult
实际上是分组的,其中从Lucene返回的第一个结果(因此得分最高的那个)是主要结果。 Hits on other versions (and also other languages) of the same item are accessible through the Subresults
property of each instance; 可以通过每个实例的
Subresults
属性访问同一项的其他版本(以及其他语言)的Subresults
; or null
when there are none. 没有时为
null
。
Depending on your requirements, you can adjust this part of the class to fit your needs. 根据您的要求,您可以调整课程的这一部分以满足您的需求。
Whilst I haven't figured out the exact answer (to stop Lucene indexing old versions on the master db ) I have come up with an acceptable work-around... 虽然我没有弄清楚确切的答案(停止Lucene索引主数据库上的旧版本)我已经想出了一个可接受的解决方案......
When Lucene returns its results from the index, each hit
has a field called "_id"
which is formatted something like this (3 versions of the same item, where the last number is the version): 当Lucene从索引返回结果时,每个
hit
都有一个名为"_id"
的字段,其格式类似于此(同一项的3个版本,其中最后一个数字是版本):
"CCB75380-4E9A-4921-99EC-65E532E330FF%en%1"
"CCB75380-4E9A-4921-99EC-65E532E330FF%en%2"
"CCB75380-4E9A-4921-99EC-65E532E330FF%en%3"
...
I'm currently sorting by Sort.RELEVANCE
which is the default. 我目前正在按
Sort.RELEVANCE
排序,这是默认值。 This is fine if we only had one version of an item in the index, but with several almost identical versions, they all have the same relevance score and Lucene just churns them out in any order. 如果我们在索引中只有一个版本的项目,但是几个几乎相同的版本,它们都具有相同的相关性分数,并且Lucene只是以任何顺序搅拌它们,这很好。 Sitecore then takes the first instance of the item version (even if it's old).
Sitecore然后获取项目版本的第一个实例(即使它是旧的)。
The solution is to specify a secondary sort field. 解决方案是指定辅助排序字段。 In the
searchContext.Search()
method, you can pass a custom Sort
object. 在
searchContext.Search()
方法中,您可以传递自定义Sort
对象。
searchContext.Search(query, new Sort(...));
By sorting by Lucene's built in Sort.RELEVANCE
first, and then by the id
field (descending) in the index, I can ensure that the first hit
that Sitecore sees will be the latest version and not just a random one: 由Lucene的内置排序
Sort.RELEVANCE
第一, 然后由id
字段中指数(降序),我可以保证第一hit
是Sitecore的认为将是最新的版本,而不是只是一个随机:
searchContext.Search(query, new Sort
(
new SortField[2]
{
SortField.FIELD_SCORE, // equivalent to Sort.RELEVANCE
new SortField("_id",SortField.STRING, true) // sort by _id, descending
}
)
);
The SortField
parameters are as follows: SortField
参数如下:
SortField(string fieldName, int type, bool reverse)
This approach has fixed my problem, but if anyone can actually find out how to only index the latest version, please answer! 这种方法解决了我的问题,但如果有人真的能找到如何只索引最新版本,请回答!
I ended up figuring out an alternate solution from the above answers, 我最终找出了上述答案的替代解决方案,
Architecturally speaking, I think the ideal solution for this problem would be to filter out the older version results using custom code at higher level rather than removing them from the master database index altogether. 从架构上讲,我认为这个问题的理想解决方案是使用更高级别的自定义代码过滤掉旧版本的结果,而不是将它们从主数据库索引中完全删除。 you don't want to manage the way sitecore is designed to work to solve problem at hand.
您不希望管理sitecore旨在解决手头问题的方式。
Use below predicate to filter out the olderversions and retrieve only latest version 使用下面的谓词来过滤掉旧版本并仅检索最新版本
predicate.And(item=>item[Sitecore.ContentSearch.BuiltinFields.LatestVersion].Equals("1"));
Hope this helps someone ! 希望这有助于某人!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.