简体   繁体   English

如何在Sitecore中配置Lucene以仅索引主数据库上项目的最新版本?

[英]How do you configure Lucene in Sitecore to only index the latest version of an item on the master db?

I recognise this is a moot point on the web database, so this question applies to the master db... 我认识到这是网络数据库上的一个有争议的问题,所以这个问题适用于主数据库......

I have a custom index set up in Sitecore 6.4.1 as follows: 我在Sitecore 6.4.1中设置了自定义索引,如下所示:

<index id="search_content_US" type="Sitecore.Search.Index, Sitecore.Kernel">
    <param desc="name">$(id)</param>
    <param desc="folder">_search_content_US</param>
    <Analyzer ref="search/analyzer" />
    <locations hint="list:AddCrawler">
        <search_content_home type="Sitecore.Search.Crawlers.DatabaseCrawler, Sitecore.Kernel">
            <Database>master</Database>
            <Root>/sitecore/content/usa home</Root>
            <Tags>home content</Tags>
        </search_content_home>
    </locations>
</index>

I query the index like this (I am using techphoria414's SortableIndexSearchContext from this answer: How to sort/filter using the new Sitecore.Search API ): 我像这样查询索引(我从这个答案中使用了techphoria414的SortableIndexSearchContext如何使用新的Sitecore.Search API进行排序/过滤 ):

private SearchHits GetSearchResults(SortableIndexSearchContext searchContext, string searchTerm)
    {
        CombinedQuery query = new CombinedQuery();
        query.Add(new FullTextQuery(searchTerm), QueryOccurance.Must);
        return searchContext.Search(query, Sort.RELEVANCE);
    }

...

SearchHits hits = GetSearchResults(searchContext, searchTerm);

hits is a collection of search hits from my index. hits是我索引中搜索命中的集合。 When I iterate through hits I can see that there are many duplicates of the same items in Sitecore, 1 per version of the item. 当我遍历hits我可以看到Sitecore中有相同项目的许多重复项,每个版本的项目有1个。

I then do the following to get a SearchResultCollection : 然后,我执行以下操作以获取SearchResultCollection

SearchResultCollection results = hits.FetchResults(0, hits.Length);

This combines all of the duplicates into a single SearchResult object. 这将所有重复项组合到一个SearchResult对象中。 This object represents 1 version of a particular item, and has a property called SubResults which is a collection of SearchResult s that represent all of the other item versions. 此对象表示特定项目的1个版本,并且具有名为SubResults的属性,该属性是SearchResult的集合,代表所有其他项目版本。

Here's my problem: 这是我的问题:

The version of the item represented by the SearchResult is NOT the current published version of the item! SearchResult表示的项目版本不是该项目的当前发布版本! It appears to be a randomly selected version (whichever the search method hit first in the index). 它似乎是一个随机选择的版本(无论搜索方法在索引中首先命中)。 The latest version is included in the SubResults collection, however. 最新版本包含SubResults集,但是。

Eg: 例如:

SearchResult
 |
 |- Version 8 // main result
 ...
 |- SubResults
      |
      |- Version 9 // latest version
      |- Version 3
      |- Version 5
      ... // all versions in random order

How do I prevent this from happening on the master db? 如何防止在主数据库上发生这种情况? Either by preventing Lucene from indexing old versions of items, or by doing some manipulation of the result set to get the latest version from the SubResults ? 要么阻止Lucene索引旧版本的项目,要么通过对结果集进行一些操作来从SubResults获取最新版本?

As an aside, why does Lucene bother to index old versions of items anyway? 顺便说一句,为什么Lucene还要为旧版本的商品编制索引呢? Surely this is pointless for searching content on your website as the old versions are not visible? 当然,这对于在您的网站上搜索内容毫无意义,因为旧版本不可见?

You can implement a custom crawler that overrides the following: 您可以实现覆盖以下内容的自定义搜寻器:

public class IndexCrawler : DatabaseCrawler
{
    protected override void IndexVersion(Item item, Item latestVersion, Sitecore.Search.IndexUpdateContext context)
    {
        if (item.Versions.Count > 0 && item.Version.Number != latestVersion.Version.Number)
            return;

        base.IndexVersion(item, latestVersion, context);
    }
}

This ensures that only the latest version of an item gets into your Index, and therefore will be the only item pull out of said index 这样可以确保只有最新版本的项目才会进入您的索引,因此它将成为拉出所述索引的唯一项目

You would need to update your configuration file to set the correct type for the index of course 您需要更新配置文件以设置索引的正确类型

Sitecore 7中 ,字段_latestversion被添加到索引中,包含最新版本的“1”(其他版本具有空值)。

如果您让Lucene在您的Web数据库而不是Master中进行搜索,则它应该仅对最后发布的版本编制索引。

<Database>web</Database>

Although the solution provided by theyetiman, by using an adjusted sort mechanism, is an interesting approach, it does not provide a perfect solution when the Lucene result scores for the two versions tend to differ. 尽管由他们提供的解决方案,通过使用调整后的排序机制,是一种有趣的方法,但当两个版本的Lucene结果得分趋于不同时,它不能提供完美的解决方案。 Eg out of v1 with score 0.7, and v2 with score 0.5, his solution will still return the first version of the item. 例如,在得分为0.7的v1和得分为0.5的v2之外,他的解决方案仍将返回该项目的第一个版本。 (At least in my tests.) (至少在我的测试中。)

After some more digging, the most obvious solution apparently lies in implementing your own Sitecore.Pipelines.Search.SearchSystemIndex and using that one instead of the default. 经过一番挖掘,最明显的解决方案显然是实现自己的Sitecore.Pipelines.Search.SearchSystemIndex并使用那个而不是默认的。 If you decompile that code using ILSpy or similar, you will notice the following at the bottom of the Process method: 如果使用ILSpy或类似程序对该代码进行反编译,您将在Process方法的底部注意到以下内容:

foreach (SearchResult current in searchHits.FetchResults(0, searchHits.Length)){
  // ...
}

Each such SearchResult is actually group-by, where the first result that was returned from Lucene (thus the one with the highest score) is the main result. 每个这样的SearchResult实际上是分组的,其中从Lucene返回的第一个结果(因此得分最高的那个)是主要结果。 Hits on other versions (and also other languages) of the same item are accessible through the Subresults property of each instance; 可以通过每个实例的Subresults属性访问同一项的其他版本(以及其他语言)的Subresults ; or null when there are none. 没有时为null

Depending on your requirements, you can adjust this part of the class to fit your needs. 根据您的要求,您可以调整课程的这一部分以满足您的需求。

Whilst I haven't figured out the exact answer (to stop Lucene indexing old versions on the master db ) I have come up with an acceptable work-around... 虽然我没有弄清楚确切的答案(停止Lucene索引主数据库上的旧版本)我已经想出了一个可接受的解决方案......

When Lucene returns its results from the index, each hit has a field called "_id" which is formatted something like this (3 versions of the same item, where the last number is the version): 当Lucene从索引返回结果时,每个hit都有一个名为"_id"的字段,其格式类似于此(同一项的3个版本,其中最后一个数字是版本):

"CCB75380-4E9A-4921-99EC-65E532E330FF%en%1"
"CCB75380-4E9A-4921-99EC-65E532E330FF%en%2"
"CCB75380-4E9A-4921-99EC-65E532E330FF%en%3"
...

I'm currently sorting by Sort.RELEVANCE which is the default. 我目前正在按Sort.RELEVANCE排序,这是默认值。 This is fine if we only had one version of an item in the index, but with several almost identical versions, they all have the same relevance score and Lucene just churns them out in any order. 如果我们在索引中只有一个版本的项目,但是几个几乎相同的版本,它们都具有相同的相关性分数,并且Lucene只是以任何顺序搅拌它们,这很好。 Sitecore then takes the first instance of the item version (even if it's old). Sitecore然后获取项目版本的第一个实例(即使它是旧的)。

The solution is to specify a secondary sort field. 解决方案是指定辅助排序字段。 In the searchContext.Search() method, you can pass a custom Sort object. searchContext.Search()方法中,您可以传递自定义Sort对象。

searchContext.Search(query, new Sort(...));

By sorting by Lucene's built in Sort.RELEVANCE first, and then by the id field (descending) in the index, I can ensure that the first hit that Sitecore sees will be the latest version and not just a random one: 由Lucene的内置排序Sort.RELEVANCE第一, 然后id字段中指数(降序),我可以保证第一hit是Sitecore的认为将是最新的版本,而不是只是一个随机:

searchContext.Search(query, new Sort
                            (
                                new SortField[2] 
                                {
                                    SortField.FIELD_SCORE, // equivalent to Sort.RELEVANCE
                                    new SortField("_id",SortField.STRING, true) // sort by _id, descending
                                }
                            )
);

The SortField parameters are as follows: SortField参数如下:

SortField(string fieldName, int type, bool reverse)

This approach has fixed my problem, but if anyone can actually find out how to only index the latest version, please answer! 这种方法解决了我的问题,但如果有人真的能找到如何只索引最新版本,请回答!

I ended up figuring out an alternate solution from the above answers, 我最终找出了上述答案的替代解决方案,

Architecturally speaking, I think the ideal solution for this problem would be to filter out the older version results using custom code at higher level rather than removing them from the master database index altogether. 从架构上讲,我认为这个问题的理想解决方案是使用更高级别的自定义代码过滤掉旧版本的结果,而不是将它们从主数据库索引中完全删除。 you don't want to manage the way sitecore is designed to work to solve problem at hand. 您不希望管理sitecore旨在解决手头问题的方式。

Use below predicate to filter out the olderversions and retrieve only latest version 使用下面的谓词来过滤掉旧版本并仅检索最新版本

predicate.And(item=>item[Sitecore.ContentSearch.BuiltinFields.LatestVersion].Equals("1"));

Hope this helps someone ! 希望这有助于某人!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在Sitecore中的Sitecore.Context.Item.Database项目上设置数据库详细信息 - How do you set the Database details on Sitecore.Context.Item.Database item in Sitecore Sitecore的“最新”项目 - Sitecore's “Latest” Item 如何更新Sitecore索引中的单个项目? - How to update a single item in a Sitecore index? 如何使用Lucene在Sitecore中具有多个位置的文档进行地理空间搜索? - How to do a geospatial search with Lucene on a document that has multiple locations in Sitecore? 您如何在Sitecore 6中制作表格? - How do you make a form in Sitecore 6? 如何在Sitecore 7中创建自定义索引以索引非Sitecore项(例如,用户) - How to you create a custom index in Sitecore 7 to index non Sitecore items (f. ex. Users) 如何获得所选项目的索引号? - How do you get the index number of the item being selected? Sitecore如何通过管道添加英文版本的Item版本 - Sitecore How to add Item's version in English language via pipelines LINQ with subselect和groupby只获取列表中每个项目的最新版本 - LINQ with subselect and groupby to get only the latest version of each item in a list 如何使用Team Foundation Server SDK获取最新版本的源代码? - How do you get the latest version of source code using the Team Foundation Server SDK?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM