简体   繁体   English

solr 和 TYPO3 索引所有类型的记录,但不索引页面

[英]solr with TYPO3 indexes all kind of records but does not index pages

pages records are not indexed in the same way than other records. pages记录的索引方式与其他记录不同。 They represent the single pages of a website which are build from other records.它们代表从其他记录构建的网站的单个页面。 So these pages are indexed accessing the frontend.所以这些页面被索引访问前端。 Every now and then there are instances where the frontend can't be indexed.时不时地有前端无法被索引的情况。 The pages records can be added to the indexing queue, but all indexing calls result in an error. pages记录可以添加到索引队列中,但所有索引调用都会导致错误。

What is needed to index pages?索引页面需要什么?

Of course you need a connection to the solr server and a base configuration to activate the solr indexer, but that should work if you can index other records like eg news.当然,您需要连接到 solr 服务器和激活 solr 索引器的基本配置,但是如果您可以索引其他记录(例如新闻),那应该可以工作。

You need some typoscript configuration, which should be present if you include the static templates from the extension.:您需要一些打字稿配置,如果您从扩展中包含 static 模板,则应该存在这些配置:

plugin.tx_solr {
    index {
        queue {
            pages = 1
            pages {
                initialization = ApacheSolrForTypo3\Solr\IndexQueue\Initializer\Page

                // allowed page types (doktype) when indexing records from table "pages"
                allowedPageTypes = 1,7,4

                indexingPriority = 0

                indexer = ApacheSolrForTypo3\Solr\IndexQueue\PageIndexer
                indexer {
                    // add options for the indexer here
                }

                // Only index standard pages and mount points that are not overlayed.
                additionalWhereClause = (doktype = 1 OR doktype=4 OR (doktype=7 AND mount_pid_ol=0)) AND no_search = 0

                //exclude some html parts inside TYPO3SEARCH markers by classname (comma list)
                excludeContentByClass = typo3-search-exclude

                fields {
                    sortSubTitle_stringS = subtitle
                }
            }
        }
    }
}

But only this does not get the page content in the index.但只有这样并没有得到索引中的页面内容。

What else needs to be configured?还需要配置什么?

The frontend must be available.前端必须可用。
Some server configuration does not allow access to the own pages.某些服务器配置不允许访问自己的页面。 Make sure the pages can be called.确保可以调用页面。
If the access is not possible with the original domain you might configure a help domain where solr can access the pages.如果原始域无法访问,您可以配置一个帮助域,solr 可以访问这些页面。 make sure you store the correct domain in the url of the index entry.确保将正确的域存储在索引条目的 url 中。

The pages need the appropriate marker to mark the relevant content, so that the menus do not spam the index with irrelevant pages:页面需要适当的标记来标记相关内容,以便菜单不会向索引中包含不相关页面的垃圾邮件:
<!--TYPO3SEARCH_begin--> and <!--TYPO3SEARCH_end--> <!--TYPO3SEARCH_begin--><!--TYPO3SEARCH_end-->
without these markers, which could occur multiple times, the whole document is computed.如果没有这些可能多次出现的标记,则计算整个文档。

But there are some further options which stop indexing:但是还有一些其他选项可以停止索引:
as seen in the question the doctype is also considered, as visibility.如问题中所见,文档类型也被视为可见性。
pages have an option Include in Search [no_search] , which is shown to external search engines, but also is evaluated from solr. pages有一个选项Include in Search [no_search] ,它显示给外部搜索引擎,但也从 solr 评估。

Last there is an option, which solr has adopted from indexed_search , but only for indexing of pages: config.index_enable = 1最后有一个选项,solr 从indexed_search采用,但仅用于页面索引: config.index_enable = 1
without this option you can index records, but all pages throw an error if they are processes in the indexing queue.如果没有此选项,您可以索引记录,但如果它们是索引队列中的进程,所有页面都会引发错误。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM